[PDF] Empirical likelihood and uniform convergence rates for dyadic kernel density estimation

Abstract

This paper studies the asymptotic properties of and improved inference methods for kernel density estimation (KDE) for dyadic data. We first establish novel uniform convergence rates for dyadic KDE under general assumptions. As the existing analytic variance estimator is known to behave unreliably in finite samples, we propose a modified jackknife empirical likelihood procedure for inference. The proposed test statistic is self-normalised and no variance estimator is required. In addition, it is asymptotically pivotal regardless of presence of dyadic clustering. The results are extended to cover the practically relevant case of incomplete dyadic network data. Simulations show that this jackknife empirical likelihood-based inference procedure delivers precise coverage probabilities even under modest sample sizes and with incomplete dyadic data. Finally, we illustrate the method by studying airport congestion.

Full PDF

EEMPIRICAL LIKELIHOOD AND UNIFORM CONVERGENCE RATES FORDYADIC KERNEL DENSITY ESTIMATION

HAROLD D. CHIANG AND BING YANG TAN

Abstract.

This paper studies the asymptotic properties of and improved inference methods forkernel density estimation (KDE) for dyadic data. We ﬁrst establish novel uniform convergencerates for dyadic KDE under general assumptions. As the existing analytic variance estimator isknown to behave unreliably in ﬁnite samples, we propose a modiﬁed jackknife empirical likelihoodprocedure for inference. The proposed test statistic is self-normalised and no variance estimatoris required. In addition, it is asymptotically pivotal regardless of presence of dyadic clustering.The results are extended to cover the practically relevant case of incomplete dyadic network data.Simulations show that this jackknife empirical likelihood-based inference procedure delivers precisecoverage probabilities even under modest sample sizes and with incomplete dyadic data. Finally,we illustrate the method by studying airport congestion.

Keywords: empirical likelihood, density estimation, dyadic network, exchangeability, missing data. Introduction

Consider a weighted undirected random graph consisting of vertices i ∈ { , ..., n } and edges con-sisting of real-valued random variables ( X ij ) i,j ∈{ ,...n } ,i

Date : First arXiv version: 16 Oct 2020.We beneﬁted from useful comments and discussions with Bruce Hansen and Yuya Sasaki. We thank YukitoshiMatsushita for sharing MATLAB simulation code. H.D. Chiang is supported by the Oﬃce of the Vice Chancellor forResearch and Graduate Education at the University of Wisconsin–Madison with funding from the Wisconsin AlumniResearch Foundation. All remaining errors are ours. a r X i v : . [ ec on . E M ] O c t ase of incomplete data under the missing at random assumption. Extensive simulation shows thatthis modiﬁed JEL inference procedure does not suﬀer from the unreliable ﬁnite sample performanceof the analytic variance estimator, delivers precise coverage probabilities even under modest samplesizes, and is robust against both dyadic clustering and i.i.d. sampling for both complete and in-complete dyadic data. Finally, we illustrate the method by studying the distribution of congestionacross diﬀerent ﬂight routes using data from the United States Department of Transportation (USDOT).In an important recent paper, Graham et al. (2019) ﬁrst propose and examine a nonparametricdensity estimator for dyadic random variables. Focusing on pointwise asymptotic behaviours, theydemonstrate that asymptotic normality at a ﬁxed design point can be established at a parametricrate with respect to number of vertices, an unique feature of dyadic KDE. This implies that itsasymptotics are robust to many choices of bandwidth rates. They also point out that degeneracythat leads to a non-Gaussan limit, such as the situation pointed out in Menzel (2020), does notoccur for dyadic KDE under mild bandwidth conditions. For inference, they propose an analyticvariance estimator that corresponds to the one proposed by Fafchamps and Gubert (2007) (hence-forth FG) under parametric models and establish its asymptotic validity. In their simulations, theydiscover that the FG variance estimator often signiﬁcantly overestimates the asymptotic variancein ﬁnite samples and therefore causes overly conservative coverage and testing results. In addition,the FG variance estimator is not guaranteed to be positive-semideﬁnite in ﬁnite samples (Cameronand Miller, 2014, pp. 11). In overcoming these obstacles, our proposed modiﬁed JEL procedure fordyadic KDE complements the ﬁndings of Graham et al. (2019).In practice, the presence of missing dyads is an important feature of network datasets, and aresearcher working with network data often observes incomplete sets of dyads. To our knowledge,none of the existing theoretical work in the literature that focus on the asymptotics of jointly ex-changeable arrays have tackled this issue. Under the missing at random assumption, we extendour theory for modiﬁed JEL to cover this practically relevant situation. The asymptotic theoryfor incomplete dyadic data is built upon the idea of asymptotics for incomplete U -statistics ﬁrstinvestigated by Janson (1984).Ever since the seminal papers of Owen (1988) and Owen (1990), empirical likelihood has beenextensively studied in the literature. For a textbook treatment, see Owen (2001). Some recentdevelopments under conventional asymptotics include Hjort, McKeague, and Van Keilegom (2009)and Bravo, Escanciano, and Van Keilegom (2020), among others. Jackknife empirical likelihood isﬁrst proposed in the seminal work of Jing, Yuan, and Zhou (2009) for parametric U -statistics andhas since been studied by Gong, Peng, and Qi (2010), Peng, Qi, and Van Keilegom (2012), Peng(2012), Wang, Peng, and Qi (2013), Zhang and Zhao (2013), Matsushita and Otsu (2018) and Chenand Tabri (2020), to list a few, for diﬀerent applications. Most of the aforementioned works focuson actual U -statistics under i.i.d. sampling. Matsushita and Otsu (2020) demonstrate that JELcan not only be modiﬁed for certain unconventional asymptotics of i.i.d. observations, such as thesmall bandwidth asymptotics of Cattaneo, Crump, and Jansson (2013), but can also be applied tostudy edge probabilities for both sparse and dense dyadic networks studied in Bickel, Chen, andLevina (2011). This is achieved by incorporating the bias-correction idea of Hinkley (1978), Efronand Stein (1981) and Cattaneo, Crump, and Jansson (2014a). In a context where controlling biasand spikes - from the presence of the bandwidth - presents extra challenges, our modiﬁed JEL fordyadic KDE provides a nonparametric counterpart to their network asymptotic results. ur results also complement the uniform convergence rates results for kernel type estimatorsunder diﬀerent dependence settings studied in Einmahl and Mason (2000), Einmahl and Mason(2005), Dony and Mason (2008), Hansen (2008), Kristensen (2009) and so forth. In additionalto the JEL literature, the asymptotics considered in this paper is also built upon the recent de-velopments in diﬀerent asymptotics of kernel density estimation and density weighted functionalestimation under i.i.d. data studied in Cattaneo et al. 2014a and Cattaneo, Crump, and Jansson(2014b).In a seminal work, Fafchamps and Gubert (2007) proposed a dyadic robust estimator for re-gression models, which has since become the benchmark for dyadic data. Related works includeFrank and Snijders (1994) and Snijders and Borgatti (1999), Aronow, Samii, and Assenova (2015)Cameron and Miller (2014) in diﬀerent contexts. Asymptotics for statistics of dyadic randomarrays are studied by Davezies, D’Haultfoeuille, and Guyonvarch (2020) in under a general em-pirical processes setting and by Chiang, Kato, and Sasaki (2020) in high-dimensions. They showthat a modiﬁed pigeonhole bootstrap (cf. McCullagh (2000) and Owen (2007)) is valid undernon-degeneracy. On the other hand, for two-way separately exchangeable arrays and linear test-statistics, Menzel (2020) proposed a conservative bootstrap that is valid uniformly even underpotentially non-gaussian degeneracy scenarios. Despite this progress, how to conduct valid infer-ence on nonlinear estimators of dyadic/polyadic random variables uniformly over degenerate andnon-degenerate DGPs remains unknown. For dyadic KDE, this non-Gaussian degeneracy scenariois not a concern; Graham et al. (2019) pointed out that non-Gaussian degeneracy can be ruled outunder mild bandwidth conditions.1.1. Notation and Organisation.

Let | A | be the cardinality of a ﬁnite set A . Denote the uni-form distribution on (0 ,

1) as U (0 , g : S → R and X ⊆ S , denote (cid:107) g (cid:107) X = sup x ∈X | g ( x ) | .Denote f (cid:48) ( · ) = ∂f ( · ) /∂x , f (cid:48)(cid:48) ( · ) = ∂ f ( · ) /∂x , and for (cid:96) = 1 , f (cid:48) X | U (cid:96) ( ·| u ) = ∂f X | U (cid:96) ( ·| u ) /∂x and f (cid:48)(cid:48) X | U (cid:96) ( ·| u ) = ∂ f X | U (cid:96) ( ·| u ) /∂x . Write a n (cid:46) b n if there exists a constant C > n such that a n ≤ Cb n . Throughout the paper, the asymptotics should be understood as taking n → ∞ .The rest of this paper is organised as follows. In Section 2, we introduce the model setting anddyadic KDE. In Section 3, the main theoretical results of uniform convergence rates, validity ofdiﬀerent JEL statistics for both complete and incomplete dyadic random arrays. Section 4 containsthe simulation studies while the empirical application can be found in Section 5. A practicalguideline on bandwidth choice and all the proofs are contained in the Appendix.2. Model and estimator

Consider a weighted undirected random graph. Let I n = { ( i, j ) : 1 ≤ i < j ≤ n } and I ∞ = (cid:83) ∞ n =2 I n . The random graph consists of an array ( X ij ) ( i,j ) ∈ I ∞ of real-valued random variables thatis generated by X ij = f ( U i , U j , U { i,j } ) , ( U i ) i ∈ N , ( U { i,j } ) i,j ∈ N i.i.d. ∼ U [0 ,

1] (2.1)for some Borel measurable map f : [0 , → R . Note that U i and U { ι, } are independent for all i ∈ N ,( ι,  ) ∈ I ∞ . While (2.1) may seem to be a structural assumption, it is implied by the followinglow-level condition of joint exchangeability via the Aldous-Hoover-Kallenberg representation, seee.g. Kallenberg (2006, Theorem 7.22). Deﬁnition 1 (Joint exchangeability) . A K -array ( X ij ) ( i,j ) ∈ I ∞ is called jointly exchangeable if forany permutation π of N , the arrays ( X ij ) ( i,j ) ∈ I ∞ and ( X π ( i ) π ( j ) ) ( i,j ) ∈ I ∞ are identically distributed. emark 1 (Identical distribution) . Under this setting, ( X ij ) ( i,j ) ∈ I ∞ are identically distributed.Suppose that the researcher observes ( X ij ) ( i,j ) ∈ I n for n ≥

2. The object of interest is the densityfunction f ( · ) = f X ( · ) of X . We assume such a density function exists and is well-deﬁned althoughsome low-level suﬃcient conditions can be obtained. For a ﬁxed design point x ∈ supp( X ) ofinterest, the dyadic KDE for θ = f ( x ) is deﬁned as (cid:98) θ = (cid:98) f n ( x ) = (cid:18) n (cid:19) − n − (cid:88) i =1 n (cid:88) j = i +1 h n K (cid:18) x − X ij h n (cid:19) = 1 | I n | (cid:88) ( i,j ) ∈ I n K ij,n , (2.2)where h n > n -dependent bandwidth, K ij,n := h − n K (( x − X ij ) /h n ) and K ( · ) is a kernelfunction. We will be more speciﬁc about the requirements for h n and K in the following Section.3. Theoretical results

Uniform convergence rates.

We ﬁrst study the uniform convergence rates of dyadic KDE.Let us make the following assumptions.

Assumption 1 (Uniform convergence rates) . Suppose that

X ⊆ supp( X ), and suppose thefollowing are satisﬁed:(i) We observe ( X ij ) ( i,j ) ∈ I n , a subset of ( X ij ) ( i,j ) ∈ I ∞ that is generated following (2.1).(ii) f ( · ) is at least twice continuously diﬀerentiable in an open neighbourhood of x and (cid:107) f (cid:48)(cid:48) (cid:107) X = O (1). f X | U ( ·| U ) is almost surely at least twice continuously diﬀerentiable in an open neigh-bourhood of x and (cid:107) f (cid:48)(cid:48) X | U (cid:107) X × [0 , = O (1).(iii) K is symmetric, right (or left) continuous non-negative and of bounded variation, (cid:107) K (cid:107) ∞ = O (1) and (cid:82) K ( u ) du = 1.(iv) h n → /h n ) /n / h n → Theorem 1 (Uniform convergence rates for dyadic KDE) . Suppose Assumption 1 holds, then withprobability at least − o (1) , sup x ∈X (cid:12)(cid:12)(cid:12) (cid:98) f n ( x ) − f ( x ) (cid:12)(cid:12)(cid:12) (cid:46) h n + 1 n / if sup x ∈X Var (cid:0) f X | U ( x | U ) (cid:1) ≥ L > . Otherwise, with probability at least − o (1) , sup x ∈X (cid:12)(cid:12)(cid:12) (cid:98) f n ( x ) − f ( x ) (cid:12)(cid:12)(cid:12) (cid:46) h n + 1 n + log(1 /h n ) n h n . Remark 2 (Convergence rates) . Theorem 1 shows that in case of degeneracy, the uniform ratecoincides with the i.i.d. sampling situation with sample size 2 (cid:0) n (cid:1) . The third component is theonly part in which the kernel plays a role (in adding log(1 /h n )). The ﬁrst two components essen-tially have parametric rates, because the ﬁrst two components consist of E [ h − n K (( x − X ij ) /h n ) | U i ] E [ h − n K (( x − X ij ) /h n ) | U i , U j ] that are uniformly bounded under the assumptions. The multi-plicative term of 1 /h n only creates a spike in the third component. These results agree with theobservations made in Graham et al. (2019) for the pointwise behaviours of the dyadic KDE.3.2. Inference for complete dyadic data with modiﬁed jackknife empirical likelihood.

We now introduce JEL based inference procedures for dyadic KDE. Denote the leave-one-out andleave-two-out index sets I ( i ) n = { ( ι,  ) ∈ I n : ι,  (cid:54) = i } , I ( i,j ) n = { ( ι,  ) ∈ I n : ι,  (cid:54)∈ { i, j }} . For a given , deﬁne for each i < j that S ( θ ) = (cid:98) θ − θ = (cid:18) n (cid:19) − (cid:88) ( i,j ) ∈ I n h n K (cid:18) x − X ij h n (cid:19) − θ,S ( i ) ( θ ) = (cid:98) θ ( i ) − θ = (cid:18) n − (cid:19) − (cid:88) ( k,l ) ∈ I ( i ) n h n K (cid:18) x − X kl h n (cid:19) − θ,S ( i,j ) ( θ ) = (cid:98) θ ( i,j ) − θ = (cid:18) n − (cid:19) − (cid:88) ( k,l ) ∈ I ( i,j ) n h n K (cid:18) x − X kl h n (cid:19) − θ. For each value of θ , deﬁne the pseudo true value for JEL as V i ( θ ) = nS ( θ ) − ( n − S ( i ) ( θ ) . For modiﬁed JEL, deﬁne Q ij = nS ( θ ) − ( n − { S ( i ) ( θ ) + S ( j ) ( θ ) } + ( n − S ( i,j ) ( θ ) , Γ = 1 n n (cid:88) i =1 V i ( θ ) , Γ m = 1 n n (cid:88) i =1 V i ( (cid:98) θ ) − n n − (cid:88) i =1 (cid:88) j = i +1 Q ij . For each value of θ , deﬁne the pseudo true value for modiﬁed JEL by V mi ( θ ) = V i ( (cid:98) θ ) − ΓΓ − m { V i ( (cid:98) θ ) − V i ( θ ) } . Now, we deﬁne the modiﬁed JEL function for θ by (cid:96) m ( θ ) = − w ,...,w n n (cid:88) i =1 log( nw i )s.t. w i ≥ , n (cid:88) i =1 w i = 1 , n (cid:88) i =1 w i V mi ( θ ) = 0 , and (cid:96) ( θ ) is deﬁned analogously with V mi replaced by V i . The Lagrangian dual problems are (cid:96) ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λV i ( θ )) , (cid:96) m ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λV mi ( θ )) . Assumption 2 (JEL for complete data) . Suppose that(i) f ( x ) > x , an interior point of X ⊆ supp( X ).(ii) K has bounded support, nh n → ∞ and nh n → h n = O ( n − / ). Wesay f is non-degenerate at x if Var (cid:0) f X | U ( x | U ) (cid:1) ≥ L >

0. The following result characterises theasymptotics of JEL and modiﬁed JEL.

Theorem 2 (Wilks’ theorem for modiﬁed JEL for dyadic KDE with complete data) . SupposeAssumptions 1 and 2 are satisﬁed, then (cid:96) m ( θ ) d → χ (1) . In addition, if f is non-degenerate at x , then (cid:96) ( θ ) d → χ (1) . emark 3 (Asymptotic pivotality) . Theorem 2 shows that the modiﬁed JEL is pivotal regardlessof whether f is degenerate or not, while JEL is asymptotically pivotal only if f is non-degenerateat x .Theorem 2 implies that one can construct an approximate 1 − α conﬁdence interval R α = { θ : (cid:96) · ( θ ) ≤ c α } for (cid:96) · ∈ { (cid:96), (cid:96) m } , where c α is such that lim n →∞ P ( θ ∈ R α ) = P ( χ ≤ c α ) = 1 − α .3.3. Inference for incomplete dyadic data with jackknife empirical likelihood.

Let usnow consider dyadic KDE with incomplete data. Suppose that we observe X ∗ ij = Z ij X ij , where the selection random variables Z ij d = Bernoulli( p ) are i.i.d. and assumed to be independentfrom ( X ij ) ( i,j ) ∈ ( i,j ) ∈ I n for some unknown p = p n ∈ (0 ,

1) which can be either ﬁxed or potentiallyconverging to zero in sample size. Deﬁne (cid:98) N = (cid:98) p (cid:0) n (cid:1) − , (cid:98) N = (cid:98) p (cid:0) n − (cid:1) − and (cid:98) N = (cid:98) p (cid:0) n − (cid:1) − with (cid:98) p = (cid:0) n (cid:1) − (cid:80) ( i,j ) ∈ I n Z ij being an estimate for p . Now let I n = { ( i, j ) ∈ I n : Z ij = 1 } , I ( k ) n = { ( i, j ) ∈ I n : Z ij = 1 , i, j (cid:54) = k } and I ( k,(cid:96) ) n = { ( i, j ) ∈ I n : Z ij = 1 , i, j (cid:54)∈ { k, (cid:96) }} . For a given θ ,deﬁne the JEL pseudo true value for incomplete dyadic data by (cid:98) V i ( θ ) = n (cid:98) S ( θ ) − ( n − (cid:98) S ( i ) ( θ ) , where (cid:98) S ( θ ) = (cid:98) θ inc − θ = 1 (cid:98) N (cid:88) ( i,j ) ∈ I n h n K (cid:18) x − X ij h n (cid:19) − θ, (cid:98) S ( i ) ( θ ) = (cid:98) θ ( i )inc − θ = 1 (cid:98) N (cid:88) ( k,l ) ∈ I ( i ) n h n K (cid:18) x − X kl h n (cid:19) − θ. In addition, for each i < j , deﬁne (cid:98) S ( i,j ) ( θ ) = (cid:98) θ ( i,j )inc − θ = 1 (cid:98) N (cid:88) ( k,l ) ∈ I ( i,j ) n h n K (cid:18) x − X kl h n (cid:19) − θ, (cid:98) Q ij = n (cid:98) S ( θ ) − ( n − { (cid:98) S ( i ) ( θ ) + (cid:98) S ( j ) ( θ ) } + ( n − (cid:98) S ( i,j ) ( θ ) , (cid:98) Γ = 1 n n (cid:88) i =1 (cid:98) V i ( θ ) , (cid:98) Γ m = 1 n n (cid:88) i =1 (cid:98) V i ( (cid:98) θ inc ) − n n − (cid:88) i =1 n (cid:88) j = i +1 (cid:98) Q ij . For each θ , deﬁne the modiﬁed JEL pseudo true value for incomplete dyadic data by (cid:98) V mi ( θ ) = (cid:98) V i ( (cid:98) θ inc ) − (cid:98) Γ (cid:98) Γ − m { (cid:98) V i ( (cid:98) θ inc ) − (cid:98) V i ( θ ) } For incomplete dyadic data, the modiﬁed JEL function for θ can be deﬁned by (cid:98) (cid:96) m ( θ ) = − w ,...,w n n (cid:88) i =1 log( nw i ) , s.t. w i ≥ , n (cid:88) i =1 w i = 1 , n (cid:88) i =1 w i (cid:98) V mi ( θ ) = 0 . nd (cid:98) (cid:96) ( θ ) is deﬁned analogously with (cid:98) V mi replaced by (cid:98) V i . The Lagrangian dual problems are (cid:98) (cid:96) ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λ (cid:98) V i ( θ )) , (cid:98) (cid:96) m ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λ (cid:98) V mi ( θ )) . Assumption 3 (JEL for incomplete data) . Suppose Z ij d = Bernoulli( p ) are i.i.d. independent from( X ij ) ( i,j ) ∈ ( i,j ) ∈ I n for some unknown sequence p = p n ∈ (0 ,

1) and n ph n → ∞ .The following result is an incomplete data counterpart of Theorem 2. Theorem 3 (Wilks’ theorem for modiﬁed JEL for dyadic KDE with incomplete data) . SupposeAssumptions 1, 2 and 3 are satisﬁed, then (cid:98) (cid:96) m ( θ ) d → χ (1) . In addition, if f is non-degenerate at x or p n → , then (cid:98) (cid:96) ( θ ) d → χ (1) . Simulation studies

We consider three sets of simulation data generating processes (DGP). The ﬁrst two follow X ij = βU i U j + U { i,j } with U { i,j } i.i.d. ∼ N (0 ,

1) and U i = − / β ∈ { , } ,where β = 1 is the suﬃciently non-degenerate DGP considered in Graham et al. (2019). This hasthe density: f ( x ) = 59 φ ( x −

1) + 49 φ ( x + 1) , where φ is the density of a standard normal random variable. β = 0 corresponds to the degener-ate case where the true DGP is i.i.d. standard normal with density f = φ . We experiment withboth the oracle MSE-optimal bandwidth from Graham et al. (2019) as well as the rule of thumbbandwidth proposed in (A.1). We consider four alternative inference methods that are theoreti-cally robust to dyadic clustering: (I) FG variance estimator for dyadic KDE proposed in Grahamet al. (2019, Equation (22)) (FG), (ii) the leading term of FG (lead. FG), (iii) jackknife empiricallikelihood (JEL) and (iv) modiﬁed jackknife empirical likelihood (mJEL). We then investigate theperformance of JEL and mJEL under incomplete dyadic DGP with only half of the data observed.Throughout the simulations, we set the design point x = 1 . ,

000 times. Tables 1, 2 and 3 showthe simulation results under these diﬀerent settings.In general, the simulation supports our theoretical ﬁndings. When the true DGP is dyadic,modiﬁed JEL based conﬁdence intervals enjoy close to nominal coverage rates even at moderatesample sizes. The original JEL is conservative but asymptotically valid. On the other hand, bothof the analytic variance estimators are severely conservative in coverage probability at all of thesimulated sample sizes. Their coverage rates are close to those in the simulation study in Grahamet al. (2019), even though diﬀerent bandwidths and kernels are employed. When the true DGPis i.i.d., modiﬁed JEL still demonstrates reasonable coverage probabilities throughout almost allsample sizes while all other three methods are severely undersized. We also observe that the coverageof modiﬁed JEL is slightly over 95 % when the sample size is large. This is most likely due to thefact that the rule of thumb bandwidth is chosen at the MSE optimal rate and thus the bias doesnot converge to zero asymptotically when sample is i.i.d. This also supports the recommendation yadic DGP ( β = 1)kernel: Epanechnikovbandwidth: Silverman’s rule of thumbmethod: FG lead. FG JEL mJELn=25 0.993 0.996 0.982 0.900n=50 0.993 0.996 0.980 0.931n=75 0.995 0.996 0.981 0.943n=100 0.994 0.996 0.983 0.945n=150 0.993 0.995 0.979 0.942n=200 0.993 0.993 0.976 0.946n=400 0.990 0.991 0.971 0.945n=600 0.988 0.990 0.970 0.948n=800 0.986 0.988 0.966 0.945 Table 1.

Coverage probabilities of 95% conﬁdence intervals under dyadic clusteringDGP i.i.d. DGP ( β = 0)kernel: Epanechnikovbandwidth: Silverman’s rule of thumbmethod: FG lead. FG JEL mJELn=25 0.993 0.996 0.984 0.895n=50 0.996 0.997 0.992 0.944n=75 0.997 0.999 0.994 0.949n=100 0.995 0.998 0.993 0.949n=150 0.996 0.998 0.995 0.957n=200 0.995 0.997 0.997 0.957n=400 0.996 0.998 0.993 0.956n=600 0.992 0.995 0.996 0.960n=800 0.996 0.997 0.995 0.959 Table 2.

Coverage probabilities of 95% conﬁdence intervals under i.i.d. DGPof slight undersmoothing from Graham et al. (2019) when the researcher is concerned about thepotential absence of dyadic clustering. Finally, we see that coverage rates of both JEL and mJELbehave similarly to the complete data situation under the incomplete dyadic data setting. Thusthe methods are robust against missing at random situations.5.

Empirical application: Airport congestion

Flight delays caused by airport congestion are a familiar experience to the ﬂying public. Asidefrom the frustration experienced by aﬀected passengers, these delays are an important cause offuel wastage and result in the release of criteria pollutants such as nitrogen oxides (NO x ), carbonmonoxide (CO) and sulphur oxides (SO x ). As discussed in Schlenker and Walker (2016), thesereleases harm the health of people living near airports. To mitigate these harms, it is important tounderstand the distribution of congestion across routes (i.e. ﬂights between pairs of airports). ncomplete dyadic DGP ( p = 0 . β = 1)kernel: Epanechnikovbandwidth: Silverman’s rule of thumbmethod: JEL mJELn=25 0.984 0.911n=50 0.987 0.946n=76 0.988 0.943n=100 0.990 0.952n=150 0.988 0.948n=200 0.982 0.954n=400 0.980 0.951n=600 0.980 0.950n=800 0.978 0.954 Table 3.

Coverage probabilities of 95% conﬁdence intervals under incompletedyadic DGPIn this empirical application, we apply the dyadic KDE and JEL and modiﬁed JEL procedure tounderstand the distribution of airport congestion, as measured by taxi time, across pairs of airportsin the United States. The taxi time of a ﬂight is deﬁned as the sum of taxi-out time (the timebetween the plane leaving its parking position and taking oﬀ) and taxi-in time (the time betweenthe plane landing and parking). In this context, the pair of dyads X ij and X jk represent ﬂightsbetween airport i and airport j and ﬂights between airport j and airport k respectively; the factthat ﬂights share a common airport j induces dependence between X ij and X jk . Therefore, this isa natural setting to apply dyadic KDE and modiﬁed JEL.Data on ﬂight taxi times are obtained from the US DOT Bureau of Transportation Statistics(BTS) Reporting Carrier On-Time Performance dataset, which compiles data on US domestic non-stop ﬂights from all carriers which earn at least 0.5% of scheduled domestic passenger revenues.From the ﬂight-level data for June 2020, we collapse taxi time into various summary statistics(mean, 95th percentile, maximum) for each origin-destination airport pair (that is, the summarystatistics are taken over ﬂights in both directions, so the resulting network is undirected). Notingthat a missing dyad means that there were no non-stop ﬂights between the vertices of that dyad,we apply the KDE, JEL, and modiﬁed JEL to the resulting network. For simplicity, the sample isrestricted to the 100 largest airports by number of departing ﬂights.The dyadic KDE estimates, point-wise JEL and point-wise modiﬁed JEL 95% conﬁdence inter-vals obtained by numerically inverting the test statistic for each design point, and histograms forthe mean, 95th percentile and maximum taxi times between airport pairs in June 2020 are shownin Figure 1A-1C. In each case - consistent with the simulation results - the JEL conﬁdence intervalsare at least as large as the modiﬁed JEL conﬁdence intervals.For applied researchers, the results demonstrate how studying only the mean can be incomplete;unlike the mean, the 95th percentile and (particularly) maximum travel times exhibit positive skew-ness. Routes whose taxi times lie in the positively skewed region are disproportionately problematic,a distinction lost in looking at means only. For all these statistics, the modiﬁed JEL procedurewhich we propose provides relatively precise estimates even under small sample sizes (100 vertices)and a large proportion of missing dyads (72% missing). . Conclusion

This paper studies the asymptotic properties of dyadic KDE. We establish uniform convergencerates under general conditions. For inference, we propose a modiﬁed jackknife empirical likelihoodprocedure which is valid regardless of degeneracy of the underlying DGP. We further extend theresults to cover incomplete or missing at random dyadic data. Simulation studies show robust ﬁnitesample performance of the modiﬁed JEL inference for dyadic KDE under diﬀerent settings. Weillustrate the method via an application to airport congestion.Despite our focus on density estimation, the approach we take can be extended to cover localpolynomial regression as well as the local polynomial density estimation of Cattaneo, Jansson, andMa (2020b) or local polynomial distributional regression of Cattaneo, Jansson, and Ma (2020a)under dyadic data. Another potential direction is to study incomplete data under unconfoundednessor other non-random missing models. Investigating the modiﬁed JEL under these assumptionsprovides interesting directions for future research.

ReferencesAronow, P. M., C. Samii, and V. A. Assenova (2015): “Cluster–robust variance estimationfor dyadic data,”

Political Analysis , 23, 564–577.

Bickel, P. J., A. Chen, and E. Levina (2011): “The method of moments and degree distribu-tions for network models,”

Annals of Statistics , 39, 2280–2301.

Bravo, F., J. C. Escanciano, and I. Van Keilegom (2020): “Two-step semiparametric em-pirical likelihood inference,”

Annals of Statistics , 48, 1–26.

Calonico, S., M. D. Cattaneo, and M. H. Farrell (2018): “On the eﬀect of bias esti-mation on coverage accuracy in nonparametric inference,”

Journal of the American StatisticalAssociation , 113, 767–779.

Cameron, A. C. and D. L. Miller (2014): “Robust inference for dyadic data,”

Unpublishedmanuscript, University of California-Davis . Cattaneo, M. D., R. K. Crump, and M. Jansson (2013): “Generalized jackknife estimators ofweighted average derivatives,”

Journal of the American Statistical Association , 108, 1243–1256.——— (2014a): “Bootstrapping density-weighted average derivatives,”

Econometric Theory , 1135–1164.——— (2014b): “Small bandwidth asymptotics for density-weighted average derivatives,”

Econo-metric Theory , 176–200.

Cattaneo, M. D., M. Jansson, and X. Ma (2020a): “Local regression distribution estimators,”

Working paper .——— (2020b): “Simple local polynomial density estimators,”

Journal of the American StatisticalAssociation , forthcoming.

Chen, R. and R. V. Tabri (2020): “Jackknife empirical likelihood for inequality constraints onregular functionals,”

Journal of Econometrics . Chen, X. and K. Kato (2019): “Jackknife multiplier bootstrap: ﬁnite sample approximations tothe U-process supremum with applications,”

Probability Theory and Related Fields , 1–67.

Chernozhukov, V., D. Chetverikov, and K. Kato (2014): “Gaussian approximation ofsuprema of empirical processes,”

Annals of Statistics , 42, 1564–1597.

Chiang, H. D., K. Kato, and Y. Sasaki (2020): “Inference for high-dimensional exchangeablearrays,” arXiv preprint arXiv:2009.05150 . Davezies, L., X. D’Haultfoeuille, and Y. Guyonvarch (2020): “Empirical Process Resultsfor Exchangeable Arrays,”

Annals of Statistics , forthcoming. de la Pe˜na, V. and E. Gin´e (1999):

Decoupling: From Dependence to Independence , Springer. ony, J. and D. M. Mason (2008): “Uniform in bandwidth consistency of conditional U -statistics,” Bernoulli , 14, 1108–1133.

Efron, B. and C. Stein (1981): “The jackknife estimate of variance,”

Annals of Statistics ,586–596.

Einmahl, U. and D. M. Mason (2000): “An empirical process approach to the uniform consis-tency of kernel-type function estimators,”

Journal of Theoretical Probability , 13, 1–37.——— (2005): “Uniform in bandwidth consistency of kernel-type function estimators,”

Annals ofStatistics , 33, 1380–1403.

Fafchamps, M. and F. Gubert (2007): “The formation of risk sharing networks,”

Journal ofDevelopment Economics , 83, 326–350.

Frank, O. and T. Snijders (1994): “Estimating the size of hidden populations using snowballsampling,”

Journal of Oﬃcial Statistics , 10, 53–53.

Ghosal, S., A. Sen, and A. W. van der Vaart (2000): “Testing monotonicity of regression,”

Annals of Statistics , 1054–1082.

Gin´e, E. and R. Nickl (2016):

Mathematical foundations of inﬁnite-dimensional statistical mod-els , vol. 40, Cambridge University Press.

Gong, Y., L. Peng, and Y. Qi (2010): “Smoothed jackknife empirical likelihood method forROC curve,”

Journal of Multivariate Analysis , 101, 1520–1531.

Graham, B. S. (2019): “Network data,” Tech. rep., National Bureau of Economic Research.

Graham, B. S., F. Niu, and J. L. Powell (2019): “Kernel density estimation for undirecteddyadic data,” arXiv preprint arXiv:1907.13630 . Hansen, B. E. (2008): “Uniform convergence rates for kernel estimation with dependent data,”

Econometric Theory , 726–748.

Hinkley, D. V. (1978): “Improving the jackknife with special reference to correlation estimation,”

Biometrika , 65, 13–21.

Hjort, N. L., I. W. McKeague, and I. Van Keilegom (2009): “Extending the scope ofempirical likelihood,”

Annals of Statistics , 37, 1079–1111.

Janson, S. (1984): “The asymptotic distributions of incomplete U-statistics,”

Zeitschrift f¨urWahrscheinlichkeitstheorie und Verwandte Gebiete , 66, 495–505.

Jing, B.-Y., J. Yuan, and W. Zhou (2009): “Jackknife empirical likelihood,”

Journal of theAmerican Statistical Association , 104, 1224–1232.

Kallenberg, O. (2006):

Probabilistic Symmetries and Invariance Principles , Springer Science &Business Media.

Kato, K. (2019): “Lecture Notes on Empirical Process Theory,” Tech. rep., Technical Report,Cornell University.

Kristensen, D. (2009): “Uniform convergence rates of kernel estimators with heterogeneousdependent data,”

Econometric Theory , 1433–1445.

Matsushita, Y. and T. Otsu (2018): “Likelihood inference on semiparametric models: Averagederivative and treatment eﬀect,”

Japanese Economic Review , 69, 133–155.——— (2020): “Jackknife empirical likelihood: small bandwidth, sparse network and high-dimension asymptotic,”

Biometrika , forthcoming.

McCullagh, P. (2000): “Resampling and exchangeable arrays,”

Bernoulli , 6, 285–301.

Menzel, K. (2020): “Bootstrap with Clustering in Two or More Dimensions,” ArXiv:1703.03043.

Owen, A. (1990): “Empirical likelihood ratio conﬁdence regions,”

Annals of Statistics , 90–120.

Owen, A. B. (1988): “Empirical likelihood ratio conﬁdence intervals for a single functional,”

Biometrika , 75, 237–249.——— (2001):

Empirical likelihood , CRC press.——— (2007): “The pigeonhole bootstrap,”

Annals of Applied Statistics , 1, 386–411. eng, L. (2012): “Approximate jackknife empirical likelihood method for estimating equations,” Canadian Journal of Statistics , 40, 110–123.

Peng, L., Y. Qi, and I. Van Keilegom (2012): “Jackknife empirical likelihood method forcopulas,”

Test , 21, 74–92.

Schlenker, W. and R. W. Walker (2016): “Airports, Air Pollution, and ContemporaneousHealth,”

Review of Economic Studies , 83, 768–809.

Silverman, B. W. (1986):

Density estimation for statistics and data analysis , vol. 26, CRC press.

Snijders, T. A. and S. P. Borgatti (1999): “Non-parametric standard errors and tests fornetwork statistics,”

Connections , 22, 161–170. van der Vaart, A. W. and J. A. Wellner (1996):

Weak Convergence and Empirical Processes ,Springer.

Wang, R., L. Peng, and Y. Qi (2013): “Jackknife empirical likelihood test for equality of twohigh dimensional means,”

Statistica Sinica , 667–690.

Zhang, Z. and Y. Zhao (2013): “Empirical likelihood for linear transformation models withinterval-censored failure time data,”

Journal of Multivariate Analysis , 116, 398–409. igure 1. Distribution of summary statistics of taxi time across routes (a)

Mean

Point estimates for the KDE, evaluated at each whole minute, are represented by the solid line. Point-wise 95%modiﬁed JEL conﬁdence intervals are represented by the two dashed lines. Point-wise 95% JEL conﬁdence intervalsare represented by the two dotted lines. b) (c) Maximum

As shown in Graham et al. (2019), the MSE-optimal bandwidth has the expression h MSEn = (cid:26) f ( w ) c (2 − f (cid:48)(cid:48) ( w ) c ) n ( n − (cid:27) / where c = (cid:82) K ( u ) du and c = (cid:82) u K ( u ) du . Graham et al. (2019) pointed out that, under non-degeneracy, KDE is robust to diﬀerent rates of bandwidths. Hence, the unknowns f ( w ) and f (cid:48)(cid:48) ( w )do not aﬀect its asymptotic behaviours. We propose to adapt the use of Silverman (1986, pp. 48,Equation (3.31)) h Sn = 0 . · min { (cid:98) σ, IQR } · (cid:26) n ( n − (cid:27) / , (A.1)where (cid:98) σ and IQR are the standard error and the interquantile range of ( X ij ) ( i,j ) ∈ I n , i.e. thediﬀerence of the 75-th quantile and the 25-th quantile, respectively. For incomplete data, to accountfor the eﬀective sample size, we propose h S, inc n = 0 . · min { (cid:98) σ inc , IQR inc } · (cid:26) (cid:98) p · n ( n − (cid:27) / , (A.2)where (cid:98) σ inc and IQR inc are the standard error and the interquantile range of ( X ij ) ( i,j ) ∈ I n , respectively.As discussed in Section 4, these rule of thumb bandwidth choices work well in our simulation studies.Formal analysis of the optimality of coverage probability of diﬀerent bandwidth choices, such asthose studied in Calonico, Cattaneo, and Farrell (2018), under dyadic asymptotics would be aninteresting future venue of research. Appendix B. Some empirical process definitions and results

We shall ﬁrst review some useful deﬁnitions on covering numbers and related notions. For moredetail, see van der Vaart and Wellner (1996) or Gin´e and Nickl (2016). Let S be a set and C be anonempty class of subsets of S . Pick any ﬁnite set { x , ..., x n } of size n . We say that C picks out asubset A ⊂ { x , ..., x n } if there exists C ∈ C such that A = { x , ..., x n } ∩ C . Let ∆ C ( x , ..., x n ) bethe number of subsets of { x , ..., x n } picks out by C . We say the class C shatters { x , ..., x n } if C picks out all of its 2 n subsets. The VC index V ( C ) is deﬁned by the smallest n for which no set ofsize n is shattered by C , i.e., with m C ( n ) := max x ,...,x n ∆ C ( x , ..., x n ), V ( C ) = (cid:40) inf { n : m C ( n ) < n } , if such set is non-empty,+ ∞ , otherwise.The class C is called a Vapnik–Chervonenkis class or VC class for short if V ( C ) < ∞ . For a realfunction f on S , its subgraph is deﬁned as sg( f ) = { ( x, t ) : t < f ( x ) } . A function class F on S is aVC subgraph class if the collection of subgraphs of f ∈ F , sg( F ) = { sg( f ) : f ∈ F } , is a VC classof sets in S × R . We can deﬁne the VC index of F , V ( F ), as the VC index of sg( F ). Let ( T, d ) bepseudometric space . For ε >

0, an ε -net of T is a subset T ε of T such that for every t ∈ T thereexists t ε ∈ T ε with d ( t, t ε ) ≤ ε . The ε -covering number N ( T, d, ε ) of T is deﬁned by N ( T, d, ε ) = inf { Card( T ε ) : T ε is an ε -net of T } . For a probability measure Q on a measurable space ( S, S ), for any r ≥

1, denote (cid:107) f (cid:107) Q,r = (cid:0)(cid:82) | f | r dQ (cid:1) /r for f ∈ L r ( S ). We say that a function G : S → R is an envelope of a class offunctions G (cid:51) g , g : S → R , if sup g ∈G | g ( s ) | ∞ ≤ G ( s ) for all s ∈ S . Suppose F be a VC subgraph i.e. d ( x, y ) = 0 doesn’t imply x = y lass with envelope F , then Theorem 2.6.7 in van der Vaart and Wellner (1996) suggests that forany r ∈ [1 , ∞ ), we havesup Q N ( F , (cid:107) · (cid:107) Q,r , ε (cid:107) F (cid:107) Q,r ) ≤ KV ( F )(16 e ) V ( F ) (cid:18) ε (cid:19) r ( V ( F ) − for all 0 < ε ≤

1, where K is universal and the supremum is taken over all ﬁnite discrete probabilitymeasures.A related concept is VC type class. We say a set of functions F with envelope F is a VC typeclass with characteristics ( A, v ) if for some positive constants

A, v ,sup Q N ( F , (cid:107) · (cid:107) Q, , ε (cid:107) F (cid:107) Q, ) ≤ (cid:18) Aε (cid:19) v , (B.1)where the supremum is taken over all ﬁnite discrete measures on ( S, S ).The following restates Lemma A.2 in Ghosal, Sen, and van der Vaart (2000), a useful result forconsidering uniform covering numbers for conditional expectations of class of functions. Lemma 1.

Let H be a class of functions h : X × Y → R with envelope H and R a ﬁxed probabilitymeasure on Y . For a given h ∈ H , let h : X → R be h = (cid:82) h ( x, y ) dR ( y ) . Set H = { h : h ∈ H} .Note that H is an envelope of H . Then, for any r, s ≥ , ε ∈ (0 , , sup Q N ( H , (cid:107) · (cid:107) Q,r , ε (cid:107) H (cid:107) Q,r ) ≤ sup Q N ( H , (cid:107) · (cid:107) Q × R,s , ε r (cid:107) H (cid:107) Q × R,s ) . Appendix C. Auxiliary lemmas

Lemma 2.

Suppose K is symmetric, (cid:107) K (cid:107) ∞ = O (1) , nh n → ∞ , and f is at least twice con-tinuously diﬀerentiable in an open neighbourhood of x and (cid:107) f (cid:48)(cid:48) (cid:107) X = O (1) , then it holds that max ≤ i ≤ n | V i ( θ ) | = o P ( n / ) .Proof. Notice that a direct calculation yields that S ( θ ) = 2 n ( n − (cid:88) ( i,j ) ∈ I n K ij,n − θ = 2 n ( n − (cid:88) i

Under the same assumptions as in Lemma 2 and suppose that n − (cid:80) ni =1 V i ( θ ) = O P (1) ,we have n − (cid:80) ni =1 | V i ( θ ) | = o P ( n / ) .Proof. It follows from1 n n (cid:88) i =1 | V i ( θ ) | ≤ max ≤ i ≤ n | V i ( θ ) | · n n (cid:88) i =1 V i ( θ ) = o P ( n / ) · O P (1) , where the last equality follows from Lemma 2. (cid:3) emma 4. Suppose assumptions of Theorem 2 are satisﬁed, then (cid:98) λ = o P ( n − / ) Proof of Lemma 4.

It essentially follows from the same argument as in the proof of Owen (1990,Equation (2.14)). The ﬁrst-order condition yields0 = 1 n n (cid:88) i =1 V i ( θ )1 + (cid:98) λV i ( θ ) . Denote (cid:98) λ = ρθ , where ρ ≥ | θ | = 1. Then0 = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 V i ( θ )1 + ρθV i ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ θn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) i =1 V i ( θ ) − ρ n (cid:88) i =1 θV i ( θ ) ρV i ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ρ n (cid:80) ni =1 V i ( θ ) ρ max ≤ i ≤ n | V i ( θ ) | − | S ( θ ) | . Note that | S ( θ ) | = O P ( n − / ) following the uniform convergence rate from Theorem 1. Further, n − (cid:80) ni =1 V i ( θ ) > f ( x ) > ≤ i ≤ n | V i ( θ ) | = o P ( n − / ) following Lemma 2. Therefore, ρ = | (cid:98) λ | = o P ( n − / ) (cid:3) Appendix D. Proofs of the main results

Throughout this appendix, for notational convenience, for any i > j , let us deﬁne X ij := X ji .D.1. Proof of Theorem 1.

Proof.

Note that (cid:98) f n ( x ) − f ( x ) = ( E [ (cid:98) f n ( x )] + f ( x )) + ( (cid:98) f n ( x ) − E [ (cid:98) f n ( x )]). The bound for the non-stochastic component follows from the identical distributions of X ij and a simple calculation E [ (cid:98) f n ( x )] = E (cid:20) h n K (cid:18) X − xh n (cid:19)(cid:21) = (cid:90) K ( u ) duf ( x − uh n ) du = f ( x ) + h n B ( x ) + o ( h n ) . where B ( x ) := 2 − f (cid:48)(cid:48) ( x ) (cid:82) u K ( u ) du . Thus Bias( (cid:98) f n ( x )) := E [ (cid:98) f n ( x ) − f ( x )] = h n B ( x ) + o ( h n ) = O ( h n ).We shall now consider the stochastic component. For each n ∈ N , write G n = { x (cid:48) (cid:55)→ K (( x − x (cid:48) ) /h n ) : x ∈ R } . Then, following the bias calculation above, one has sup g ∈G n E [ g ( X )] ≤(cid:107) f (cid:107) X h n + O ( h n ) = O ( h n ). and for each x ∈ R , there exists a g ∈ G n such that (cid:98) f n ( x ) = (cid:18) n (cid:19) − (cid:88) ( i,j ) ∈ I n h n · g ( X ij ) . enote g ( · ) = E [ g ( X ) | U = · ]. The decomposition in Graham et al. (2019) (Equations (9)-(11))yields (cid:18) n (cid:19) − (cid:88) ( i,j ) ∈ I n h n · ( g ( X ij ) − E [ g ( X )])= 2 n n (cid:88) i =1 h n ( g ( U i ) − E [ g ( X )])+ 2 n ( n − (cid:88) ( i,j ) ∈ I n h n ( E [ g ( X ij ) | U i , U j ] − g ( U i ) − g ( U j ) + E [ g ( X )])+ 2 n ( n − (cid:88) ( i,j ) ∈ I n h n ( g ( X ij ) − E [ g ( X ij ) | U i , U j ])= 1 n n (cid:88) i =1 L i ( g ) + 2 n ( n − (cid:88) ( i,j ) ∈ I n W ij ( g ) + 2 n ( n − (cid:88) ( i,j ) ∈ I n R ij ( g ) . (D.1)Note that each of the three terms on the right hand side are centred. It suﬃces to bound eachterms on the right hand side of (D.1).We shall ﬁrst obtain some preliminary results for the uniform covering numbers of the classesof functions in these terms. Since K is of bounded variation, Proposition 3.6.12 in Gin´e and Nickl(2016) implies that G = { ( u, v, w ) (cid:55)→ K ( t f ( u, v, w ) − s ) : t ≥ , s ∈ R } ⊃ G n is of VC type class offunctions with envelope (cid:107) K (cid:107) ∞ and characteristics ( A, v ) for some

A > v ≥ K . Further, Lemma 1 implies that G = { u (cid:55)→ E [ K ( t f ( u, U , U ) − s ) | U = u ] , t ≥ , s ∈ R }G = { u (cid:55)→ E [ K ( t f ( U , u, U ) − s ) | U = u ] , t ≥ , s ∈ R }G = { ( u, v ) (cid:55)→ E [ K ( t f ( u, v, U ) − s ) | U = u, U = v ] , t ≥ , s ∈ R } are of VC type with characteristics (4 √ A, v ) and envelope (cid:107) K (cid:107) ∞ . Let H n := { /h n } , a setconsisting of one single function. Corollary 7 in Kato (2019) implies that for each (cid:96) = 2 , , ε ∈ (0 , Q N ( H n · G (cid:96) , (cid:107) · (cid:107) Q, , ε (cid:107) h − n (cid:107) K (cid:107) ∞ (cid:107) Q, ) ≤ sup Q N ( H n , (cid:107) · (cid:107) Q, , εh − n ) (cid:124) (cid:123)(cid:122) (cid:125) = |H n | =1 · sup Q N ( G (cid:96) , (cid:107) · (cid:107) Q, , ε (cid:107) K (cid:107) ∞ ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:46) ( √ A/ε ) v = ( √ A/ε ) v = O (1) . Observe that for some

M < ∞ , it holds for all u ∈ [0 ,

1] that E [ h − n K (( x − X ij ) /h n ) | U i = u ] = (cid:90) K ( u ) f X | U ( x − h n s | u ) ds ≤ (cid:107) f X | U (cid:107) X × [0 , + O ( h n ) ≤ M, with a similar equality holding for E [ h − n K (( x − X ij ) /h n ) | U i = u, U j = v ]. We shall replace theenvelope h − n (cid:107) K (cid:107) ∞ with M . Notice that for any (cid:15) ∈ (0 , ε = (cid:15)h n M/ (cid:107) K (cid:107) ∞ , for n large enough such that ε ≤

1, it holds thatsup Q N ( H n · G (cid:96) , (cid:107) · (cid:107) Q, , (cid:15)M ) = sup Q N ( H n · G (cid:96) , (cid:107) · (cid:107) Q, , εh − n · (cid:107) K (cid:107) ∞ ) = ( √ A/ε ) v = O (1) . hus we have, by Corollary 7 in Kato (2019),sup Q N ( h − n ( G − G − G − E [ g ( X )]) , (cid:107) · (cid:107) Q, , √ (cid:15)M ) (cid:46) ( √ A/ε ) v = O (1) . Now, coming back to the second term of (D.1), it consists of a centred, completely degenerate U -process (see Section 3.5 in de la Pe˜na and Gin´e (1999) for the deﬁnition). Hence by using theabove result of sup Q N ( h − n ( G − G − G − E [ g ( X )]) , (cid:107) · (cid:107) Q, , √ (cid:15)M ) = O (1) for any ε ∈ (0 , E  sup g ∈G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( n − (cid:88) ≤ i

This provides the bound for the ﬁrst term of (D.1).For the third term of (D.1), note that conditional on ( U i ) ni =1 , ( h n · R ij ) ( i,j ) ∈ I n are centred andi.i.d. Thus similar entropy calculations and an application of Corollary 5.1 in Chernozhukov et al.(2014) yields1 h n · E  sup g ∈G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( n − (cid:88) ( i,j ) ∈ I n h n R ij ( g ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | ( U i ) ni =1  (cid:46) σ U ( g ) (cid:115) v log( A (cid:107) K (cid:107) ∞ /σ U ( g )) n h n + v log( A (cid:107) K (cid:107) ∞ /σ U ( g )) (cid:107) K (cid:107) ∞ n h n . where σ U ( g ) := sup g ∈G n E [( h n R ij ( g )) | ( U i ) ni =1 ] = O ( h n ). By Jensen’s inequality and Fubini’stheorem, we have E  sup g ∈G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( n − (cid:88) ( i,j ) ∈ I n R ij ( g ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) (cid:115) v log( A (cid:107) K (cid:107) ∞ /h n ) n h n + v log( A (cid:107) K (cid:107) ∞ /h n ) (cid:107) K (cid:107) ∞ n h n . This shows the convergence rate of the stochastic component. (cid:3)

D.2.

Proof of Theorem 2.

Proof.

In this proof, we write L i , R ij and W ij for L i ( g ), R ij ( g ) and W ij ( g ) in Equation (D.1) where g corresponds to design point x . Also let us denote κ = nh n ,Ω ( x ) = Var (cid:0) f X | U ( x | U ) (cid:1) , Ω ( x ) = f ( x ) (cid:90) K ( u ) du. The ﬁrst part of this proof follows a standard argument for empirical likelihood. For (cid:98) λ :=argsup λ (cid:80) ni =1 log(1 + λV i ( θ )), the ﬁrst-order condition and the algebraic fact that (1 + x ) − =1 − x + x (1 + x ) − yield0 = 1 n n (cid:88) i =1 V i ( θ )1 + (cid:98) λV i ( θ ) = 1 n n (cid:88) i =1 V i ( θ ) − n n (cid:88) i =1 V i ( θ ) (cid:98) λ + 1 n n (cid:88) i =1 V i ( θ ) (cid:98) λ (cid:98) λV i ( θ ) . emmas 3 and 4 imply that (cid:98) λ = (cid:80) ni =1 V i ( θ ) (cid:80) ni =1 V i ( θ ) + o P ( n − / ) . Thus a Taylor expansion of log(1 + (cid:98) λV i ( θ )) at 1 gives2 n (cid:88) i =1 log(1 + (cid:98) λV i ( θ )) = 2 n (cid:88) i =1 (cid:18)(cid:98) λV i ( θ ) − (cid:110)(cid:98) λV i ( θ ) (cid:111) (cid:19) + o P (1) = (cid:8) n (cid:80) ni =1 V i ( θ ) (cid:9) n (cid:80) ni =1 V i ( θ ) + o P (1) . Note that Var( (cid:98) θ ) = 4Ω ( x ) /n + Ω ( x ) /n h n + O ( n − ) + O ( h n /n ) following Graham et al. (2019,pp. 12). Let σ = 4Ω ( x ) + κ − Ω ( x ). Notice that f ( x ) > > σ >

0. Itwill be shown 1 √ nσ n (cid:88) i =1 V i ( θ ) d → N (0 , . To see this, note that 1 n n (cid:88) i =1 V i ( θ ) = 1 n n (cid:88) i =1 (cid:16) nS ( θ ) − ( n − S ( i ) ( θ ) (cid:17) = S ( θ )using the fact that1 n n (cid:88) i =1 S ( i ) ( θ ) = 2 n ( n − n − n (cid:88) i =1  n − (cid:88) ι =1 (cid:88) j>ι K ιj,n − (cid:88) j>i K ij,n − (cid:88) ιk,l (cid:54) = i ( W kl + R kl ) + Bias( (cid:98) θ ( i ) ) . (D.3) n addition, notice that (cid:80) k (cid:54) = i (cid:80) l>k,l (cid:54) = i R kl = (cid:80) n − k =1 (cid:80) nl = k +1 R kl − (cid:80) l>i R il − (cid:80) k

12 ( R + R ) (cid:21) + o P (1) = 2 nσ Var( R ) + o P (1) , following the WLLN for triangular arrays, the conditional independence (on ( U i ) ≤ i ≤ n ) of ( R ij ) ( i,j ) ∈ I n ,the identical distribution of ( R ij ) ( i,j ) ∈ I n and the fact that Var( Y ) = E [( Y i − Y j ) /

2] for i.i.d. Y i ’s.Hence, under non-degeneracy, we have1 nσ n (cid:88) i =1 V i ( θ ) P → ( x ) σ → nσ n (cid:88) i =1 V i ( θ ) P → σ (cid:8) ( x ) + 2 κ − Ω ( x ) (cid:9) (cid:54)→ . We now consider modiﬁed JEL. Notice that1 nσ n (cid:88) i =1 V mi ( θ ) P → σ − (cid:8) ( x ) + 2 κ − Ω ( x ) (cid:9) , following a similar argument as above as well as the consistency of (cid:98) θ . It now suﬃces to show that1 √ nσ n (cid:88) i =1 V mi ( θ ) d → N (cid:0) , σ − (cid:8) ( x ) + 2 κ − Ω ( x ) (cid:9)(cid:1) . s (cid:80) ni =1 V i ( (cid:98) θ ) = 0, it holds that n − / (cid:80) ni =1 V mi ( θ ) = ΓΓ − m n − / (cid:80) ni =1 V i ( θ ). We have from above, n − / σ − (cid:80) ni =1 V i ( θ ) d → N (0 , − m = (cid:40) n − σ − (cid:80) ni =1 V i ( (cid:98) θ ) n − σ − (cid:80) ni =1 V i ( (cid:98) θ ) − n − σ − (cid:80) n − i =1 (cid:80) nj = i +1 Q ij (cid:41) / . By consistency of (cid:98) θ and the convergence of n − σ − (cid:80) ni =1 V i ( θ ) , we have n − σ − (cid:80) ni =1 V i ( (cid:98) θ ) P → σ − { ( x ) + 2 κ − Ω ( x ) } . We now verify that1 nσ n − (cid:88) i =1 n (cid:88) j = i +1 Q ij P → σ − κ − Ω ( x ) . To see this, observe that( n − S ( i ) ( θ ) = 2( n − (cid:32) (cid:88) k (cid:54) = i L k + n − (cid:88) k =1 n (cid:88) l = i +1 ( W kl + R kl ) − n (cid:88) l = i +1 ( W il + R il ) − i − (cid:88) k =1 ( W ki + R ki ) (cid:33) + Bias( (cid:98) θ ( i ) ) , ( n − S ( i,j ) ( θ ) = 2( n − (cid:32) (cid:88) k (cid:54) = i,j L k + n − (cid:88) k =1 n (cid:88) l = i +1 ( W kl + R kl ) − n (cid:88) l = j +1 ( W jl + R jl ) − j − (cid:88) k =1 ( W kj + R kj ) − n (cid:88) l = i +1 ( W il + R il ) − i − (cid:88) k =1 ( W ki + R ki ) + ( W ij + R ij ) (cid:33) + Bias( (cid:98) θ ( i,j ) ) . Observe that Bias( (cid:98) θ ) = Bias( (cid:98) θ ( i ) ) = Bias( (cid:98) θ ( j ) ) = Bias( (cid:98) θ ( i,j ) ) as they are estimated using the samebandwidth h n . Hence, as W ij terms are of smaller order asymptotically, we have Q ij = 1 n −  n − n − n − (cid:88) k =1 n (cid:88) l = k +1 R kl − n − (cid:88) l>i R il + (cid:88) kj R jl + (cid:88) k
D.3.

Proof of Theorem 3.

Proof.

In this proof, let us abbreviate X I n = ( X ij ) ( i,j ) ∈ ( i,j ) ∈ I n . Deﬁne γ = p/ (1 − p ). Denote N =: pn ( n − / N =: p ( n − n − / N =: ( n − n − /

2. First notice that (cid:98) p/p P → { n ( n − } − / . Hence (cid:98) N /N P → (cid:98) N /N P → (cid:98) N /N P → hus if we deﬁne (cid:101) S ( θ ) = 1 N (cid:88) ( i,j ) ∈ I n h n Z ij K (cid:18) x − X ij h n (cid:19) − θ, (cid:101) S ( k ) ( θ ) = 1 N (cid:88) ( i,j ) ∈ I n ( k ) h n Z ij K (cid:18) x − X ij h n (cid:19) − θ, (cid:101) S ( k,l ) ( θ ) = 1 N (cid:88) ( i,j ) ∈ I n ( k,l ) h n Z ij K (cid:18) x − X ij h n (cid:19) − θ Then for all i = 1 , ..., n , it holds uniformly that (cid:101) S ( θ ) = (cid:98) S ( θ ) + o P (1)and similar results hold for (cid:98) S ( k ) ( θ ) and (cid:98) S ( k,l ) ( θ ). Hence we have, uniformly in i = 1 , ..., n , that (cid:101) V i ( θ ) := n (cid:101) S ( θ ) − ( n − (cid:101) S ( i ) ( θ ) = (cid:98) V i ( θ ) + o P (1) . Observe that we have1 √ n n (cid:88) i =1 (cid:101) V i ( θ ) = 1 √ n n (cid:88) i =1 (cid:16) n (cid:101) S ( θ ) − ( n − (cid:101) S ( i ) ( θ ) (cid:17) = √ n (cid:101) S ( θ )using the fact n − (cid:80) ni =1 (cid:101) S ( i ) ( θ ) = (cid:101) S ( θ ), which follows Equation (D.2) with Z ij K ij,n in place of K ij,n .Now, the same argument from the ﬁrst part of the Proof of Theorem 2 holds for (cid:101) V i ( θ ) in placeof V i ( θ ) and thus we have2 n (cid:88) i =1 log(1 + (cid:98) λ (cid:101) V i ( θ )) = (cid:110) n (cid:80) ni =1 (cid:101) V i ( θ ) (cid:111) n (cid:80) ni =1 (cid:101) V i ( θ ) + o P (1) . Denote (cid:101) σ = σ + γ − κ − Ω ( x ) = Var( (cid:98) θ inc ) + o P (1), where σ = 4Ω ( x ) + κ − Ω ( x ) is the sameas in the proof for Theorem 2. To this end, it suﬃces to verify that for Σ := (cid:101) σ − { ( x ) +2 κ − Ω ( x ) } + (cid:101) σ − γ − κ − Ω ( x ), it holds that1 √ n (cid:101) σ n (cid:88) i =1 (cid:101) V mi ( θ ) d → N (0 , Σ ) , n (cid:101) σ n (cid:88) i =1 (cid:101) V mi ( θ ) P → Σ . Notice that we can decompose1 √ n (cid:101) σ n (cid:88) i =1 (cid:101) V i ( θ ) = √ n (cid:101) σ − (cid:101) S ( θ ) = √ nN (cid:101) σ (cid:88) ( i,j ) ∈ I n h n Z ij K (cid:18) x − X ij h n (cid:19) − θ + √ nN (cid:101) σ (cid:88) ( i,j ) ∈ I n ( Z ij − p ) K ij,n = √ n (cid:101) σ − { S ( θ ) + T ( θ ) } , where S ( θ ) is the same as in the complete dyadic data case and T ( θ ) = N − (cid:80) ( i,j ) ∈ I n ( Z ij − p ) K ij,n .From the proof of Theorem 2, we have √ nσ − S ( θ ) d → N (0 , X I n , √ n (cid:101) σ − T ( θ )consists of a sum of independent random variable with mean 0 and conditional varianceVar (cid:0) √ n (cid:101) σ − T ( θ ) | X I n (cid:1) = (1 − p ) p ( n − (cid:101) σ (cid:88) ( i,j ) ∈ I n K ij,n . Observe that by the law of total variance, we haveVar( T ( θ )) = (1 − p ) p n h n Ω ( x ) + o (1) . n addition, one can verify the conditional Lyapunov’s condition1 { Var( T ( θ )) } / E (cid:104) | T ( θ ) | | X I n (cid:105) (cid:46) { Var( T ( θ )) } / pp n (cid:88) ( i,j ) ∈ I n K ij,n . (D.4)Notice that if n ph n → ∞ , the right hand side of (D.4) is of order o P (1) since E  { Var( T ( θ )) } / pp n (cid:88) ( i,j ) ∈ I n K ij,n  (cid:46) np / h / n = o (1) . Thus an application of Lyapunov’s CLT yields that with probability 1 − o (1), it holds that { Var( T ( θ )) } − / T ( θ ) d | X → N  , (cid:8) Var( T ( θ )) pn (cid:9) − (1 − p ) (cid:88) ( i,j ) ∈ I n K ij,n  , where the convergence in distribution is with respect to the law of ( Z ij ) ( i,j ) ∈ I n . The variancecalculation above and conditional convergence in distribution then implies unconditional conver-gence in distribution { Var( T ( θ )) } − / T ( θ ) d → N (0 , S ( θ ) and T ( θ ) are uncorrelated and thus asymptotically independent as they are bothasymptotically normal. Hence we have √ n (cid:101) σ − (cid:101) S ( θ ) = √ n (cid:101) σ − { S ( θ ) + T ( θ ) } d → N (cid:0) , (cid:101) σ − σ + (cid:101) σ − γ − κ − Ω ( x ) (cid:1) = N (0 , . Now, we claim 1 n (cid:101) σ n (cid:88) i =1 (cid:101) V i ( θ ) P → (cid:101) σ − { ( x ) + 2 κ − Ω ( x ) } + (cid:101) σ − γ − κ − Ω ( x ) . Deﬁne T ( i ) ( θ ) = ( N ( i )1 ) − (cid:80) ( k,l ) ∈ I ( i ) n ( Z kl − p ) K kl,n , the leave-one-out version of T . Note that (cid:101) V i = (cid:101) S − ( n − (cid:101) S ( i ) − (cid:101) S ). Then1 n (cid:101) σ n (cid:88) i =1 (cid:101) V i ( θ ) = (cid:101) σ − (cid:32) (cid:101) S ( θ ) − n − (cid:101) S ( θ ) 1 n n (cid:88) i =1 { (cid:101) S ( i ) ( θ ) − (cid:101) S ( θ ) } + ( n − n n (cid:88) i =1 { (cid:101) S ( i ) ( θ ) − (cid:101) S ( θ ) } (cid:33) = (cid:101) σ − ( M − M + M ) . bserve that M = o P (1) following the uniform convergence rate from Theorem 1 and M = 0using Equation (D.2). Similar to the calculation of term T in the Proof for Theorem 2, (cid:101) σ − M = ( n − n (cid:101) σ n (cid:88) i =1 { (cid:101) S ( i ) ( θ ) − (cid:101) S ( θ ) } = ( n − n (cid:101) σ n (cid:88) i =1 (cid:40) (cid:101) S ( i ) ( θ ) − n n (cid:88) i =1 (cid:101) S ( i ) ( θ ) (cid:41) = ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) (cid:101) S ( i ) ( θ ) − (cid:101) S ( j ) ( θ ) (cid:111) = ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110)(cid:16) S ( i ) ( θ ) + T ( i ) ( θ ) (cid:17) − (cid:16) S ( j ) ( θ ) + T ( j ) ( θ ) (cid:17)(cid:111) = ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) S ( i ) ( θ ) − S ( j ) ( θ ) (cid:111) + ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) T ( i ) ( θ ) − T ( j ) ( θ ) (cid:111) + ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) S ( i ) ( θ ) − S ( j ) ( θ ) (cid:111) (cid:110) T ( i ) ( θ ) − T ( j ) ( θ ) (cid:111) P → (cid:101) σ − { ( x ) + 2 κ − Ω ( x ) } + (cid:101) σ − γ − κ − Ω ( x ) = Σ . where the convergence in the last line is a direct result of the WLLN for triangular arrays and theconditional independence (given X I n ) between S ( i ) ( θ ) and T ( i ) ( θ ). The same proof strategy yieldsthat n − (cid:101) σ − (cid:80) ni =1 (cid:101) V mi ( θ ) P → Σ .It remains to show that 1 √ n (cid:101) σ n (cid:88) i =1 (cid:101) V mi ( θ ) d → N (0 , Σ ) . Note that (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) = 0 and hence1 √ n (cid:101) σ n (cid:88) i =1 (cid:98) V mi ( θ ) = (cid:40) n − (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) n − (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) − n − (cid:101) σ − (cid:80) n − i =1 (cid:80) nj = i +1 (cid:98) Q ij (cid:41) / · √ n (cid:101) σ n (cid:88) i =1 (cid:98) V i ( θ ) . We have from above, n − / (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( θ ) d → N (0 , (cid:98) θ inc , we have n − (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) P → Σ . It suﬃces to show that the denominator in the square root converges to unity asymptotically.Observe that (cid:98) Q ij = 1 p ( n − (cid:40) n − n − n − (cid:88) k =1 (cid:88) l>k Z kl R kl − n − (cid:88) l (cid:54) = i Z il R il + (cid:88) l (cid:54) = j Z jl R jl  + 2 Z ij R ij (cid:41) + o P (1) . Therefore, by the identical distribution and conditional independence (given ( U i ) ni =1 ) of ( Z ij R ij ) ( i,j ) ∈ I n ,using the WLLN for triangular arrays following the same argument as in the proof of Theorem 2,we have n − (cid:101) σ − (cid:80) n − i =1 (cid:80) nj = i +1 (cid:98) Q ij P → (cid:101) σ − κ − Ω ( x ). This concludes the proof. (cid:3) H. D. Chiang)

Department of Economics, University of Wisconsin-Madison, William H. Sewell So-cial Science Building, 1180 Observatory Drive, Madison, WI 53706, USA.

Email address : [email protected] (B. Y. Tan) Department of Economics, Vanderbilt University, VU Station B, Box

Email address : [email protected]@vanderbilt.edu

Related Researches

Optimal transportation and the falsifiability of incompletely specified economic models

by Ivar Ekeland

Extreme dependence for multivariate data

by Damien Bosc

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

by Falco J. Bargagli Stoffi

Dilation bootstrap

by Alfred Galichon

A test of non-identifying restrictions and confidence regions for partially identified parameters

by Alfred Galichon

A note on global identification in structural vector autoregressions

by Emanuele Bacchiocchi

Duality in dynamic discrete-choice models

by Khai Xiang Chiong

Identification of Matching Complementarities: A Geometric Viewpoint

by Alfred Galichon

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

by Federico A. Bugni

Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods

by Milad Haghani

Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity

by Milad Haghani

The Econometrics and Some Properties of Separable Matching Models

by Alfred Galichon

Discretizing Unobserved Heterogeneity

by Stéphane Bonhomme Thibaut Lamadon Elena Manresa

Permutation Tests at Nonparametric Rates

by Marinho Bertanha

General Bayesian time-varying parameter VARs for predicting government bond yields

by Manfred M. Fischer

Quasi-maximum likelihood estimation of break point in high-dimensional factor models

by Jiangtao Duan

A Control Function Approach to Estimate Panel Data Binary Response Model

by Amaresh K Tiwari

Inference in Incomplete Models

by Alfred Galichon

Set Identification in Models with Multiple Equilibria

by Alfred Galichon

Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows

by Luke De Clerk

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

by Charles F Manski

Bridging factor and sparse models

by Jianqing Fan

Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem

by Mario Faliva

A Novel Multi-Period and Multilateral Price Index

by Consuelo Rubina Nava

Estimation and Inference by Stochastic Optimization: Three Examples

by Jean-Jacques Forneron

«

1

2

3

4

»

Submitted on 17 Oct 2020 (v1), last revised 20 Oct 2020 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar