Empirical likelihood and uniform convergence rates for dyadic kernel density estimation
EEMPIRICAL LIKELIHOOD AND UNIFORM CONVERGENCE RATES FORDYADIC KERNEL DENSITY ESTIMATION
HAROLD D. CHIANG AND BING YANG TAN
Abstract.
This paper studies the asymptotic properties of and improved inference methods forkernel density estimation (KDE) for dyadic data. We first establish novel uniform convergencerates for dyadic KDE under general assumptions. As the existing analytic variance estimator isknown to behave unreliably in finite samples, we propose a modified jackknife empirical likelihoodprocedure for inference. The proposed test statistic is self-normalised and no variance estimatoris required. In addition, it is asymptotically pivotal regardless of presence of dyadic clustering.The results are extended to cover the practically relevant case of incomplete dyadic network data.Simulations show that this jackknife empirical likelihood-based inference procedure delivers precisecoverage probabilities even under modest sample sizes and with incomplete dyadic data. Finally,we illustrate the method by studying airport congestion.
Keywords: empirical likelihood, density estimation, dyadic network, exchangeability, missing data. Introduction
Consider a weighted undirected random graph consisting of vertices i ∈ { , ..., n } and edges con-sisting of real-valued random variables ( X ij ) i,j ∈{ ,...n } ,i Date : First arXiv version: 16 Oct 2020.We benefited from useful comments and discussions with Bruce Hansen and Yuya Sasaki. We thank YukitoshiMatsushita for sharing MATLAB simulation code. H.D. Chiang is supported by the Office of the Vice Chancellor forResearch and Graduate Education at the University of Wisconsin–Madison with funding from the Wisconsin AlumniResearch Foundation. All remaining errors are ours. a r X i v : . [ ec on . E M ] O c t ase of incomplete data under the missing at random assumption. Extensive simulation shows thatthis modified JEL inference procedure does not suffer from the unreliable finite sample performanceof the analytic variance estimator, delivers precise coverage probabilities even under modest samplesizes, and is robust against both dyadic clustering and i.i.d. sampling for both complete and in-complete dyadic data. Finally, we illustrate the method by studying the distribution of congestionacross different flight routes using data from the United States Department of Transportation (USDOT).In an important recent paper, Graham et al. (2019) first propose and examine a nonparametricdensity estimator for dyadic random variables. Focusing on pointwise asymptotic behaviours, theydemonstrate that asymptotic normality at a fixed design point can be established at a parametricrate with respect to number of vertices, an unique feature of dyadic KDE. This implies that itsasymptotics are robust to many choices of bandwidth rates. They also point out that degeneracythat leads to a non-Gaussan limit, such as the situation pointed out in Menzel (2020), does notoccur for dyadic KDE under mild bandwidth conditions. For inference, they propose an analyticvariance estimator that corresponds to the one proposed by Fafchamps and Gubert (2007) (hence-forth FG) under parametric models and establish its asymptotic validity. In their simulations, theydiscover that the FG variance estimator often significantly overestimates the asymptotic variancein finite samples and therefore causes overly conservative coverage and testing results. In addition,the FG variance estimator is not guaranteed to be positive-semidefinite in finite samples (Cameronand Miller, 2014, pp. 11). In overcoming these obstacles, our proposed modified JEL procedure fordyadic KDE complements the findings of Graham et al. (2019).In practice, the presence of missing dyads is an important feature of network datasets, and aresearcher working with network data often observes incomplete sets of dyads. To our knowledge,none of the existing theoretical work in the literature that focus on the asymptotics of jointly ex-changeable arrays have tackled this issue. Under the missing at random assumption, we extendour theory for modified JEL to cover this practically relevant situation. The asymptotic theoryfor incomplete dyadic data is built upon the idea of asymptotics for incomplete U -statistics firstinvestigated by Janson (1984).Ever since the seminal papers of Owen (1988) and Owen (1990), empirical likelihood has beenextensively studied in the literature. For a textbook treatment, see Owen (2001). Some recentdevelopments under conventional asymptotics include Hjort, McKeague, and Van Keilegom (2009)and Bravo, Escanciano, and Van Keilegom (2020), among others. Jackknife empirical likelihood isfirst proposed in the seminal work of Jing, Yuan, and Zhou (2009) for parametric U -statistics andhas since been studied by Gong, Peng, and Qi (2010), Peng, Qi, and Van Keilegom (2012), Peng(2012), Wang, Peng, and Qi (2013), Zhang and Zhao (2013), Matsushita and Otsu (2018) and Chenand Tabri (2020), to list a few, for different applications. Most of the aforementioned works focuson actual U -statistics under i.i.d. sampling. Matsushita and Otsu (2020) demonstrate that JELcan not only be modified for certain unconventional asymptotics of i.i.d. observations, such as thesmall bandwidth asymptotics of Cattaneo, Crump, and Jansson (2013), but can also be applied tostudy edge probabilities for both sparse and dense dyadic networks studied in Bickel, Chen, andLevina (2011). This is achieved by incorporating the bias-correction idea of Hinkley (1978), Efronand Stein (1981) and Cattaneo, Crump, and Jansson (2014a). In a context where controlling biasand spikes - from the presence of the bandwidth - presents extra challenges, our modified JEL fordyadic KDE provides a nonparametric counterpart to their network asymptotic results. ur results also complement the uniform convergence rates results for kernel type estimatorsunder different dependence settings studied in Einmahl and Mason (2000), Einmahl and Mason(2005), Dony and Mason (2008), Hansen (2008), Kristensen (2009) and so forth. In additionalto the JEL literature, the asymptotics considered in this paper is also built upon the recent de-velopments in different asymptotics of kernel density estimation and density weighted functionalestimation under i.i.d. data studied in Cattaneo et al. 2014a and Cattaneo, Crump, and Jansson(2014b).In a seminal work, Fafchamps and Gubert (2007) proposed a dyadic robust estimator for re-gression models, which has since become the benchmark for dyadic data. Related works includeFrank and Snijders (1994) and Snijders and Borgatti (1999), Aronow, Samii, and Assenova (2015)Cameron and Miller (2014) in different contexts. Asymptotics for statistics of dyadic randomarrays are studied by Davezies, D’Haultfoeuille, and Guyonvarch (2020) in under a general em-pirical processes setting and by Chiang, Kato, and Sasaki (2020) in high-dimensions. They showthat a modified pigeonhole bootstrap (cf. McCullagh (2000) and Owen (2007)) is valid undernon-degeneracy. On the other hand, for two-way separately exchangeable arrays and linear test-statistics, Menzel (2020) proposed a conservative bootstrap that is valid uniformly even underpotentially non-gaussian degeneracy scenarios. Despite this progress, how to conduct valid infer-ence on nonlinear estimators of dyadic/polyadic random variables uniformly over degenerate andnon-degenerate DGPs remains unknown. For dyadic KDE, this non-Gaussian degeneracy scenariois not a concern; Graham et al. (2019) pointed out that non-Gaussian degeneracy can be ruled outunder mild bandwidth conditions.1.1. Notation and Organisation. Let | A | be the cardinality of a finite set A . Denote the uni-form distribution on (0 , 1) as U (0 , g : S → R and X ⊆ S , denote (cid:107) g (cid:107) X = sup x ∈X | g ( x ) | .Denote f (cid:48) ( · ) = ∂f ( · ) /∂x , f (cid:48)(cid:48) ( · ) = ∂ f ( · ) /∂x , and for (cid:96) = 1 , f (cid:48) X | U (cid:96) ( ·| u ) = ∂f X | U (cid:96) ( ·| u ) /∂x and f (cid:48)(cid:48) X | U (cid:96) ( ·| u ) = ∂ f X | U (cid:96) ( ·| u ) /∂x . Write a n (cid:46) b n if there exists a constant C > n such that a n ≤ Cb n . Throughout the paper, the asymptotics should be understood as taking n → ∞ .The rest of this paper is organised as follows. In Section 2, we introduce the model setting anddyadic KDE. In Section 3, the main theoretical results of uniform convergence rates, validity ofdifferent JEL statistics for both complete and incomplete dyadic random arrays. Section 4 containsthe simulation studies while the empirical application can be found in Section 5. A practicalguideline on bandwidth choice and all the proofs are contained in the Appendix.2. Model and estimator Consider a weighted undirected random graph. Let I n = { ( i, j ) : 1 ≤ i < j ≤ n } and I ∞ = (cid:83) ∞ n =2 I n . The random graph consists of an array ( X ij ) ( i,j ) ∈ I ∞ of real-valued random variables thatis generated by X ij = f ( U i , U j , U { i,j } ) , ( U i ) i ∈ N , ( U { i,j } ) i,j ∈ N i.i.d. ∼ U [0 , 1] (2.1)for some Borel measurable map f : [0 , → R . Note that U i and U { ι, } are independent for all i ∈ N ,( ι, ) ∈ I ∞ . While (2.1) may seem to be a structural assumption, it is implied by the followinglow-level condition of joint exchangeability via the Aldous-Hoover-Kallenberg representation, seee.g. Kallenberg (2006, Theorem 7.22). Definition 1 (Joint exchangeability) . A K -array ( X ij ) ( i,j ) ∈ I ∞ is called jointly exchangeable if forany permutation π of N , the arrays ( X ij ) ( i,j ) ∈ I ∞ and ( X π ( i ) π ( j ) ) ( i,j ) ∈ I ∞ are identically distributed. emark 1 (Identical distribution) . Under this setting, ( X ij ) ( i,j ) ∈ I ∞ are identically distributed.Suppose that the researcher observes ( X ij ) ( i,j ) ∈ I n for n ≥ 2. The object of interest is the densityfunction f ( · ) = f X ( · ) of X . We assume such a density function exists and is well-defined althoughsome low-level sufficient conditions can be obtained. For a fixed design point x ∈ supp( X ) ofinterest, the dyadic KDE for θ = f ( x ) is defined as (cid:98) θ = (cid:98) f n ( x ) = (cid:18) n (cid:19) − n − (cid:88) i =1 n (cid:88) j = i +1 h n K (cid:18) x − X ij h n (cid:19) = 1 | I n | (cid:88) ( i,j ) ∈ I n K ij,n , (2.2)where h n > n -dependent bandwidth, K ij,n := h − n K (( x − X ij ) /h n ) and K ( · ) is a kernelfunction. We will be more specific about the requirements for h n and K in the following Section.3. Theoretical results Uniform convergence rates. We first study the uniform convergence rates of dyadic KDE.Let us make the following assumptions. Assumption 1 (Uniform convergence rates) . Suppose that X ⊆ supp( X ), and suppose thefollowing are satisfied:(i) We observe ( X ij ) ( i,j ) ∈ I n , a subset of ( X ij ) ( i,j ) ∈ I ∞ that is generated following (2.1).(ii) f ( · ) is at least twice continuously differentiable in an open neighbourhood of x and (cid:107) f (cid:48)(cid:48) (cid:107) X = O (1). f X | U ( ·| U ) is almost surely at least twice continuously differentiable in an open neigh-bourhood of x and (cid:107) f (cid:48)(cid:48) X | U (cid:107) X × [0 , = O (1).(iii) K is symmetric, right (or left) continuous non-negative and of bounded variation, (cid:107) K (cid:107) ∞ = O (1) and (cid:82) K ( u ) du = 1.(iv) h n → /h n ) /n / h n → Theorem 1 (Uniform convergence rates for dyadic KDE) . Suppose Assumption 1 holds, then withprobability at least − o (1) , sup x ∈X (cid:12)(cid:12)(cid:12) (cid:98) f n ( x ) − f ( x ) (cid:12)(cid:12)(cid:12) (cid:46) h n + 1 n / if sup x ∈X Var (cid:0) f X | U ( x | U ) (cid:1) ≥ L > . Otherwise, with probability at least − o (1) , sup x ∈X (cid:12)(cid:12)(cid:12) (cid:98) f n ( x ) − f ( x ) (cid:12)(cid:12)(cid:12) (cid:46) h n + 1 n + log(1 /h n ) n h n . Remark 2 (Convergence rates) . Theorem 1 shows that in case of degeneracy, the uniform ratecoincides with the i.i.d. sampling situation with sample size 2 (cid:0) n (cid:1) . The third component is theonly part in which the kernel plays a role (in adding log(1 /h n )). The first two components essen-tially have parametric rates, because the first two components consist of E [ h − n K (( x − X ij ) /h n ) | U i ] E [ h − n K (( x − X ij ) /h n ) | U i , U j ] that are uniformly bounded under the assumptions. The multi-plicative term of 1 /h n only creates a spike in the third component. These results agree with theobservations made in Graham et al. (2019) for the pointwise behaviours of the dyadic KDE.3.2. Inference for complete dyadic data with modified jackknife empirical likelihood. We now introduce JEL based inference procedures for dyadic KDE. Denote the leave-one-out andleave-two-out index sets I ( i ) n = { ( ι, ) ∈ I n : ι, (cid:54) = i } , I ( i,j ) n = { ( ι, ) ∈ I n : ι, (cid:54)∈ { i, j }} . For a given , define for each i < j that S ( θ ) = (cid:98) θ − θ = (cid:18) n (cid:19) − (cid:88) ( i,j ) ∈ I n h n K (cid:18) x − X ij h n (cid:19) − θ,S ( i ) ( θ ) = (cid:98) θ ( i ) − θ = (cid:18) n − (cid:19) − (cid:88) ( k,l ) ∈ I ( i ) n h n K (cid:18) x − X kl h n (cid:19) − θ,S ( i,j ) ( θ ) = (cid:98) θ ( i,j ) − θ = (cid:18) n − (cid:19) − (cid:88) ( k,l ) ∈ I ( i,j ) n h n K (cid:18) x − X kl h n (cid:19) − θ. For each value of θ , define the pseudo true value for JEL as V i ( θ ) = nS ( θ ) − ( n − S ( i ) ( θ ) . For modified JEL, define Q ij = nS ( θ ) − ( n − { S ( i ) ( θ ) + S ( j ) ( θ ) } + ( n − S ( i,j ) ( θ ) , Γ = 1 n n (cid:88) i =1 V i ( θ ) , Γ m = 1 n n (cid:88) i =1 V i ( (cid:98) θ ) − n n − (cid:88) i =1 (cid:88) j = i +1 Q ij . For each value of θ , define the pseudo true value for modified JEL by V mi ( θ ) = V i ( (cid:98) θ ) − ΓΓ − m { V i ( (cid:98) θ ) − V i ( θ ) } . Now, we define the modified JEL function for θ by (cid:96) m ( θ ) = − w ,...,w n n (cid:88) i =1 log( nw i )s.t. w i ≥ , n (cid:88) i =1 w i = 1 , n (cid:88) i =1 w i V mi ( θ ) = 0 , and (cid:96) ( θ ) is defined analogously with V mi replaced by V i . The Lagrangian dual problems are (cid:96) ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λV i ( θ )) , (cid:96) m ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λV mi ( θ )) . Assumption 2 (JEL for complete data) . Suppose that(i) f ( x ) > x , an interior point of X ⊆ supp( X ).(ii) K has bounded support, nh n → ∞ and nh n → h n = O ( n − / ). Wesay f is non-degenerate at x if Var (cid:0) f X | U ( x | U ) (cid:1) ≥ L > 0. The following result characterises theasymptotics of JEL and modified JEL. Theorem 2 (Wilks’ theorem for modified JEL for dyadic KDE with complete data) . SupposeAssumptions 1 and 2 are satisfied, then (cid:96) m ( θ ) d → χ (1) . In addition, if f is non-degenerate at x , then (cid:96) ( θ ) d → χ (1) . emark 3 (Asymptotic pivotality) . Theorem 2 shows that the modified JEL is pivotal regardlessof whether f is degenerate or not, while JEL is asymptotically pivotal only if f is non-degenerateat x .Theorem 2 implies that one can construct an approximate 1 − α confidence interval R α = { θ : (cid:96) · ( θ ) ≤ c α } for (cid:96) · ∈ { (cid:96), (cid:96) m } , where c α is such that lim n →∞ P ( θ ∈ R α ) = P ( χ ≤ c α ) = 1 − α .3.3. Inference for incomplete dyadic data with jackknife empirical likelihood. Let usnow consider dyadic KDE with incomplete data. Suppose that we observe X ∗ ij = Z ij X ij , where the selection random variables Z ij d = Bernoulli( p ) are i.i.d. and assumed to be independentfrom ( X ij ) ( i,j ) ∈ ( i,j ) ∈ I n for some unknown p = p n ∈ (0 , 1) which can be either fixed or potentiallyconverging to zero in sample size. Define (cid:98) N = (cid:98) p (cid:0) n (cid:1) − , (cid:98) N = (cid:98) p (cid:0) n − (cid:1) − and (cid:98) N = (cid:98) p (cid:0) n − (cid:1) − with (cid:98) p = (cid:0) n (cid:1) − (cid:80) ( i,j ) ∈ I n Z ij being an estimate for p . Now let I n = { ( i, j ) ∈ I n : Z ij = 1 } , I ( k ) n = { ( i, j ) ∈ I n : Z ij = 1 , i, j (cid:54) = k } and I ( k,(cid:96) ) n = { ( i, j ) ∈ I n : Z ij = 1 , i, j (cid:54)∈ { k, (cid:96) }} . For a given θ ,define the JEL pseudo true value for incomplete dyadic data by (cid:98) V i ( θ ) = n (cid:98) S ( θ ) − ( n − (cid:98) S ( i ) ( θ ) , where (cid:98) S ( θ ) = (cid:98) θ inc − θ = 1 (cid:98) N (cid:88) ( i,j ) ∈ I n h n K (cid:18) x − X ij h n (cid:19) − θ, (cid:98) S ( i ) ( θ ) = (cid:98) θ ( i )inc − θ = 1 (cid:98) N (cid:88) ( k,l ) ∈ I ( i ) n h n K (cid:18) x − X kl h n (cid:19) − θ. In addition, for each i < j , define (cid:98) S ( i,j ) ( θ ) = (cid:98) θ ( i,j )inc − θ = 1 (cid:98) N (cid:88) ( k,l ) ∈ I ( i,j ) n h n K (cid:18) x − X kl h n (cid:19) − θ, (cid:98) Q ij = n (cid:98) S ( θ ) − ( n − { (cid:98) S ( i ) ( θ ) + (cid:98) S ( j ) ( θ ) } + ( n − (cid:98) S ( i,j ) ( θ ) , (cid:98) Γ = 1 n n (cid:88) i =1 (cid:98) V i ( θ ) , (cid:98) Γ m = 1 n n (cid:88) i =1 (cid:98) V i ( (cid:98) θ inc ) − n n − (cid:88) i =1 n (cid:88) j = i +1 (cid:98) Q ij . For each θ , define the modified JEL pseudo true value for incomplete dyadic data by (cid:98) V mi ( θ ) = (cid:98) V i ( (cid:98) θ inc ) − (cid:98) Γ (cid:98) Γ − m { (cid:98) V i ( (cid:98) θ inc ) − (cid:98) V i ( θ ) } For incomplete dyadic data, the modified JEL function for θ can be defined by (cid:98) (cid:96) m ( θ ) = − w ,...,w n n (cid:88) i =1 log( nw i ) , s.t. w i ≥ , n (cid:88) i =1 w i = 1 , n (cid:88) i =1 w i (cid:98) V mi ( θ ) = 0 . nd (cid:98) (cid:96) ( θ ) is defined analogously with (cid:98) V mi replaced by (cid:98) V i . The Lagrangian dual problems are (cid:98) (cid:96) ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λ (cid:98) V i ( θ )) , (cid:98) (cid:96) m ( θ ) = 2 sup λ n (cid:88) i =1 log(1 + λ (cid:98) V mi ( θ )) . Assumption 3 (JEL for incomplete data) . Suppose Z ij d = Bernoulli( p ) are i.i.d. independent from( X ij ) ( i,j ) ∈ ( i,j ) ∈ I n for some unknown sequence p = p n ∈ (0 , 1) and n ph n → ∞ .The following result is an incomplete data counterpart of Theorem 2. Theorem 3 (Wilks’ theorem for modified JEL for dyadic KDE with incomplete data) . SupposeAssumptions 1, 2 and 3 are satisfied, then (cid:98) (cid:96) m ( θ ) d → χ (1) . In addition, if f is non-degenerate at x or p n → , then (cid:98) (cid:96) ( θ ) d → χ (1) . Simulation studies We consider three sets of simulation data generating processes (DGP). The first two follow X ij = βU i U j + U { i,j } with U { i,j } i.i.d. ∼ N (0 , 1) and U i = − / β ∈ { , } ,where β = 1 is the sufficiently non-degenerate DGP considered in Graham et al. (2019). This hasthe density: f ( x ) = 59 φ ( x − 1) + 49 φ ( x + 1) , where φ is the density of a standard normal random variable. β = 0 corresponds to the degener-ate case where the true DGP is i.i.d. standard normal with density f = φ . We experiment withboth the oracle MSE-optimal bandwidth from Graham et al. (2019) as well as the rule of thumbbandwidth proposed in (A.1). We consider four alternative inference methods that are theoreti-cally robust to dyadic clustering: (I) FG variance estimator for dyadic KDE proposed in Grahamet al. (2019, Equation (22)) (FG), (ii) the leading term of FG (lead. FG), (iii) jackknife empiricallikelihood (JEL) and (iv) modified jackknife empirical likelihood (mJEL). We then investigate theperformance of JEL and mJEL under incomplete dyadic DGP with only half of the data observed.Throughout the simulations, we set the design point x = 1 . , 000 times. Tables 1, 2 and 3 showthe simulation results under these different settings.In general, the simulation supports our theoretical findings. When the true DGP is dyadic,modified JEL based confidence intervals enjoy close to nominal coverage rates even at moderatesample sizes. The original JEL is conservative but asymptotically valid. On the other hand, bothof the analytic variance estimators are severely conservative in coverage probability at all of thesimulated sample sizes. Their coverage rates are close to those in the simulation study in Grahamet al. (2019), even though different bandwidths and kernels are employed. When the true DGPis i.i.d., modified JEL still demonstrates reasonable coverage probabilities throughout almost allsample sizes while all other three methods are severely undersized. We also observe that the coverageof modified JEL is slightly over 95 % when the sample size is large. This is most likely due to thefact that the rule of thumb bandwidth is chosen at the MSE optimal rate and thus the bias doesnot converge to zero asymptotically when sample is i.i.d. This also supports the recommendation yadic DGP ( β = 1)kernel: Epanechnikovbandwidth: Silverman’s rule of thumbmethod: FG lead. FG JEL mJELn=25 0.993 0.996 0.982 0.900n=50 0.993 0.996 0.980 0.931n=75 0.995 0.996 0.981 0.943n=100 0.994 0.996 0.983 0.945n=150 0.993 0.995 0.979 0.942n=200 0.993 0.993 0.976 0.946n=400 0.990 0.991 0.971 0.945n=600 0.988 0.990 0.970 0.948n=800 0.986 0.988 0.966 0.945 Table 1. Coverage probabilities of 95% confidence intervals under dyadic clusteringDGP i.i.d. DGP ( β = 0)kernel: Epanechnikovbandwidth: Silverman’s rule of thumbmethod: FG lead. FG JEL mJELn=25 0.993 0.996 0.984 0.895n=50 0.996 0.997 0.992 0.944n=75 0.997 0.999 0.994 0.949n=100 0.995 0.998 0.993 0.949n=150 0.996 0.998 0.995 0.957n=200 0.995 0.997 0.997 0.957n=400 0.996 0.998 0.993 0.956n=600 0.992 0.995 0.996 0.960n=800 0.996 0.997 0.995 0.959 Table 2. Coverage probabilities of 95% confidence intervals under i.i.d. DGPof slight undersmoothing from Graham et al. (2019) when the researcher is concerned about thepotential absence of dyadic clustering. Finally, we see that coverage rates of both JEL and mJELbehave similarly to the complete data situation under the incomplete dyadic data setting. Thusthe methods are robust against missing at random situations.5. Empirical application: Airport congestion Flight delays caused by airport congestion are a familiar experience to the flying public. Asidefrom the frustration experienced by affected passengers, these delays are an important cause offuel wastage and result in the release of criteria pollutants such as nitrogen oxides (NO x ), carbonmonoxide (CO) and sulphur oxides (SO x ). As discussed in Schlenker and Walker (2016), thesereleases harm the health of people living near airports. To mitigate these harms, it is important tounderstand the distribution of congestion across routes (i.e. flights between pairs of airports). ncomplete dyadic DGP ( p = 0 . β = 1)kernel: Epanechnikovbandwidth: Silverman’s rule of thumbmethod: JEL mJELn=25 0.984 0.911n=50 0.987 0.946n=76 0.988 0.943n=100 0.990 0.952n=150 0.988 0.948n=200 0.982 0.954n=400 0.980 0.951n=600 0.980 0.950n=800 0.978 0.954 Table 3. Coverage probabilities of 95% confidence intervals under incompletedyadic DGPIn this empirical application, we apply the dyadic KDE and JEL and modified JEL procedure tounderstand the distribution of airport congestion, as measured by taxi time, across pairs of airportsin the United States. The taxi time of a flight is defined as the sum of taxi-out time (the timebetween the plane leaving its parking position and taking off) and taxi-in time (the time betweenthe plane landing and parking). In this context, the pair of dyads X ij and X jk represent flightsbetween airport i and airport j and flights between airport j and airport k respectively; the factthat flights share a common airport j induces dependence between X ij and X jk . Therefore, this isa natural setting to apply dyadic KDE and modified JEL.Data on flight taxi times are obtained from the US DOT Bureau of Transportation Statistics(BTS) Reporting Carrier On-Time Performance dataset, which compiles data on US domestic non-stop flights from all carriers which earn at least 0.5% of scheduled domestic passenger revenues.From the flight-level data for June 2020, we collapse taxi time into various summary statistics(mean, 95th percentile, maximum) for each origin-destination airport pair (that is, the summarystatistics are taken over flights in both directions, so the resulting network is undirected). Notingthat a missing dyad means that there were no non-stop flights between the vertices of that dyad,we apply the KDE, JEL, and modified JEL to the resulting network. For simplicity, the sample isrestricted to the 100 largest airports by number of departing flights.The dyadic KDE estimates, point-wise JEL and point-wise modified JEL 95% confidence inter-vals obtained by numerically inverting the test statistic for each design point, and histograms forthe mean, 95th percentile and maximum taxi times between airport pairs in June 2020 are shownin Figure 1A-1C. In each case - consistent with the simulation results - the JEL confidence intervalsare at least as large as the modified JEL confidence intervals.For applied researchers, the results demonstrate how studying only the mean can be incomplete;unlike the mean, the 95th percentile and (particularly) maximum travel times exhibit positive skew-ness. Routes whose taxi times lie in the positively skewed region are disproportionately problematic,a distinction lost in looking at means only. For all these statistics, the modified JEL procedurewhich we propose provides relatively precise estimates even under small sample sizes (100 vertices)and a large proportion of missing dyads (72% missing). . Conclusion This paper studies the asymptotic properties of dyadic KDE. We establish uniform convergencerates under general conditions. For inference, we propose a modified jackknife empirical likelihoodprocedure which is valid regardless of degeneracy of the underlying DGP. We further extend theresults to cover incomplete or missing at random dyadic data. Simulation studies show robust finitesample performance of the modified JEL inference for dyadic KDE under different settings. Weillustrate the method via an application to airport congestion.Despite our focus on density estimation, the approach we take can be extended to cover localpolynomial regression as well as the local polynomial density estimation of Cattaneo, Jansson, andMa (2020b) or local polynomial distributional regression of Cattaneo, Jansson, and Ma (2020a)under dyadic data. Another potential direction is to study incomplete data under unconfoundednessor other non-random missing models. Investigating the modified JEL under these assumptionsprovides interesting directions for future research. ReferencesAronow, P. M., C. Samii, and V. A. Assenova (2015): “Cluster–robust variance estimationfor dyadic data,” Political Analysis , 23, 564–577. Bickel, P. J., A. Chen, and E. Levina (2011): “The method of moments and degree distribu-tions for network models,” Annals of Statistics , 39, 2280–2301. Bravo, F., J. C. Escanciano, and I. Van Keilegom (2020): “Two-step semiparametric em-pirical likelihood inference,” Annals of Statistics , 48, 1–26. Calonico, S., M. D. Cattaneo, and M. H. Farrell (2018): “On the effect of bias esti-mation on coverage accuracy in nonparametric inference,” Journal of the American StatisticalAssociation , 113, 767–779. Cameron, A. C. and D. L. Miller (2014): “Robust inference for dyadic data,” Unpublishedmanuscript, University of California-Davis . Cattaneo, M. D., R. K. Crump, and M. Jansson (2013): “Generalized jackknife estimators ofweighted average derivatives,” Journal of the American Statistical Association , 108, 1243–1256.——— (2014a): “Bootstrapping density-weighted average derivatives,” Econometric Theory , 1135–1164.——— (2014b): “Small bandwidth asymptotics for density-weighted average derivatives,” Econo-metric Theory , 176–200. Cattaneo, M. D., M. Jansson, and X. Ma (2020a): “Local regression distribution estimators,” Working paper .——— (2020b): “Simple local polynomial density estimators,” Journal of the American StatisticalAssociation , forthcoming. Chen, R. and R. V. Tabri (2020): “Jackknife empirical likelihood for inequality constraints onregular functionals,” Journal of Econometrics . Chen, X. and K. Kato (2019): “Jackknife multiplier bootstrap: finite sample approximations tothe U-process supremum with applications,” Probability Theory and Related Fields , 1–67. Chernozhukov, V., D. Chetverikov, and K. Kato (2014): “Gaussian approximation ofsuprema of empirical processes,” Annals of Statistics , 42, 1564–1597. Chiang, H. D., K. Kato, and Y. Sasaki (2020): “Inference for high-dimensional exchangeablearrays,” arXiv preprint arXiv:2009.05150 . Davezies, L., X. D’Haultfoeuille, and Y. Guyonvarch (2020): “Empirical Process Resultsfor Exchangeable Arrays,” Annals of Statistics , forthcoming. de la Pe˜na, V. and E. Gin´e (1999): Decoupling: From Dependence to Independence , Springer. ony, J. and D. M. Mason (2008): “Uniform in bandwidth consistency of conditional U -statistics,” Bernoulli , 14, 1108–1133. Efron, B. and C. Stein (1981): “The jackknife estimate of variance,” Annals of Statistics ,586–596. Einmahl, U. and D. M. Mason (2000): “An empirical process approach to the uniform consis-tency of kernel-type function estimators,” Journal of Theoretical Probability , 13, 1–37.——— (2005): “Uniform in bandwidth consistency of kernel-type function estimators,” Annals ofStatistics , 33, 1380–1403. Fafchamps, M. and F. Gubert (2007): “The formation of risk sharing networks,” Journal ofDevelopment Economics , 83, 326–350. Frank, O. and T. Snijders (1994): “Estimating the size of hidden populations using snowballsampling,” Journal of Official Statistics , 10, 53–53. Ghosal, S., A. Sen, and A. W. van der Vaart (2000): “Testing monotonicity of regression,” Annals of Statistics , 1054–1082. Gin´e, E. and R. Nickl (2016): Mathematical foundations of infinite-dimensional statistical mod-els , vol. 40, Cambridge University Press. Gong, Y., L. Peng, and Y. Qi (2010): “Smoothed jackknife empirical likelihood method forROC curve,” Journal of Multivariate Analysis , 101, 1520–1531. Graham, B. S. (2019): “Network data,” Tech. rep., National Bureau of Economic Research. Graham, B. S., F. Niu, and J. L. Powell (2019): “Kernel density estimation for undirecteddyadic data,” arXiv preprint arXiv:1907.13630 . Hansen, B. E. (2008): “Uniform convergence rates for kernel estimation with dependent data,” Econometric Theory , 726–748. Hinkley, D. V. (1978): “Improving the jackknife with special reference to correlation estimation,” Biometrika , 65, 13–21. Hjort, N. L., I. W. McKeague, and I. Van Keilegom (2009): “Extending the scope ofempirical likelihood,” Annals of Statistics , 37, 1079–1111. Janson, S. (1984): “The asymptotic distributions of incomplete U-statistics,” Zeitschrift f¨urWahrscheinlichkeitstheorie und Verwandte Gebiete , 66, 495–505. Jing, B.-Y., J. Yuan, and W. Zhou (2009): “Jackknife empirical likelihood,” Journal of theAmerican Statistical Association , 104, 1224–1232. Kallenberg, O. (2006): Probabilistic Symmetries and Invariance Principles , Springer Science &Business Media. Kato, K. (2019): “Lecture Notes on Empirical Process Theory,” Tech. rep., Technical Report,Cornell University. Kristensen, D. (2009): “Uniform convergence rates of kernel estimators with heterogeneousdependent data,” Econometric Theory , 1433–1445. Matsushita, Y. and T. Otsu (2018): “Likelihood inference on semiparametric models: Averagederivative and treatment effect,” Japanese Economic Review , 69, 133–155.——— (2020): “Jackknife empirical likelihood: small bandwidth, sparse network and high-dimension asymptotic,” Biometrika , forthcoming. McCullagh, P. (2000): “Resampling and exchangeable arrays,” Bernoulli , 6, 285–301. Menzel, K. (2020): “Bootstrap with Clustering in Two or More Dimensions,” ArXiv:1703.03043. Owen, A. (1990): “Empirical likelihood ratio confidence regions,” Annals of Statistics , 90–120. Owen, A. B. (1988): “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika , 75, 237–249.——— (2001): Empirical likelihood , CRC press.——— (2007): “The pigeonhole bootstrap,” Annals of Applied Statistics , 1, 386–411. eng, L. (2012): “Approximate jackknife empirical likelihood method for estimating equations,” Canadian Journal of Statistics , 40, 110–123. Peng, L., Y. Qi, and I. Van Keilegom (2012): “Jackknife empirical likelihood method forcopulas,” Test , 21, 74–92. Schlenker, W. and R. W. Walker (2016): “Airports, Air Pollution, and ContemporaneousHealth,” Review of Economic Studies , 83, 768–809. Silverman, B. W. (1986): Density estimation for statistics and data analysis , vol. 26, CRC press. Snijders, T. A. and S. P. Borgatti (1999): “Non-parametric standard errors and tests fornetwork statistics,” Connections , 22, 161–170. van der Vaart, A. W. and J. A. Wellner (1996): Weak Convergence and Empirical Processes ,Springer. Wang, R., L. Peng, and Y. Qi (2013): “Jackknife empirical likelihood test for equality of twohigh dimensional means,” Statistica Sinica , 667–690. Zhang, Z. and Y. Zhao (2013): “Empirical likelihood for linear transformation models withinterval-censored failure time data,” Journal of Multivariate Analysis , 116, 398–409. igure 1. Distribution of summary statistics of taxi time across routes (a) Mean Point estimates for the KDE, evaluated at each whole minute, are represented by the solid line. Point-wise 95%modified JEL confidence intervals are represented by the two dashed lines. Point-wise 95% JEL confidence intervalsare represented by the two dotted lines. b) (c) Maximum Point estimates for the KDE, evaluated at each whole minute, are represented by the solid line. Point-wise 95%modified JEL confidence intervals are represented by the two dashed lines. Point-wise 95% JEL confidence intervalsare represented by the two dotted lines. ppendix A. Practical guideline for bandwidth As shown in Graham et al. (2019), the MSE-optimal bandwidth has the expression h MSEn = (cid:26) f ( w ) c (2 − f (cid:48)(cid:48) ( w ) c ) n ( n − (cid:27) / where c = (cid:82) K ( u ) du and c = (cid:82) u K ( u ) du . Graham et al. (2019) pointed out that, under non-degeneracy, KDE is robust to different rates of bandwidths. Hence, the unknowns f ( w ) and f (cid:48)(cid:48) ( w )do not affect its asymptotic behaviours. We propose to adapt the use of Silverman (1986, pp. 48,Equation (3.31)) h Sn = 0 . · min { (cid:98) σ, IQR } · (cid:26) n ( n − (cid:27) / , (A.1)where (cid:98) σ and IQR are the standard error and the interquantile range of ( X ij ) ( i,j ) ∈ I n , i.e. thedifference of the 75-th quantile and the 25-th quantile, respectively. For incomplete data, to accountfor the effective sample size, we propose h S, inc n = 0 . · min { (cid:98) σ inc , IQR inc } · (cid:26) (cid:98) p · n ( n − (cid:27) / , (A.2)where (cid:98) σ inc and IQR inc are the standard error and the interquantile range of ( X ij ) ( i,j ) ∈ I n , respectively.As discussed in Section 4, these rule of thumb bandwidth choices work well in our simulation studies.Formal analysis of the optimality of coverage probability of different bandwidth choices, such asthose studied in Calonico, Cattaneo, and Farrell (2018), under dyadic asymptotics would be aninteresting future venue of research. Appendix B. Some empirical process definitions and results We shall first review some useful definitions on covering numbers and related notions. For moredetail, see van der Vaart and Wellner (1996) or Gin´e and Nickl (2016). Let S be a set and C be anonempty class of subsets of S . Pick any finite set { x , ..., x n } of size n . We say that C picks out asubset A ⊂ { x , ..., x n } if there exists C ∈ C such that A = { x , ..., x n } ∩ C . Let ∆ C ( x , ..., x n ) bethe number of subsets of { x , ..., x n } picks out by C . We say the class C shatters { x , ..., x n } if C picks out all of its 2 n subsets. The VC index V ( C ) is defined by the smallest n for which no set ofsize n is shattered by C , i.e., with m C ( n ) := max x ,...,x n ∆ C ( x , ..., x n ), V ( C ) = (cid:40) inf { n : m C ( n ) < n } , if such set is non-empty,+ ∞ , otherwise.The class C is called a Vapnik–Chervonenkis class or VC class for short if V ( C ) < ∞ . For a realfunction f on S , its subgraph is defined as sg( f ) = { ( x, t ) : t < f ( x ) } . A function class F on S is aVC subgraph class if the collection of subgraphs of f ∈ F , sg( F ) = { sg( f ) : f ∈ F } , is a VC classof sets in S × R . We can define the VC index of F , V ( F ), as the VC index of sg( F ). Let ( T, d ) bepseudometric space . For ε > 0, an ε -net of T is a subset T ε of T such that for every t ∈ T thereexists t ε ∈ T ε with d ( t, t ε ) ≤ ε . The ε -covering number N ( T, d, ε ) of T is defined by N ( T, d, ε ) = inf { Card( T ε ) : T ε is an ε -net of T } . For a probability measure Q on a measurable space ( S, S ), for any r ≥ 1, denote (cid:107) f (cid:107) Q,r = (cid:0)(cid:82) | f | r dQ (cid:1) /r for f ∈ L r ( S ). We say that a function G : S → R is an envelope of a class offunctions G (cid:51) g , g : S → R , if sup g ∈G | g ( s ) | ∞ ≤ G ( s ) for all s ∈ S . Suppose F be a VC subgraph i.e. d ( x, y ) = 0 doesn’t imply x = y lass with envelope F , then Theorem 2.6.7 in van der Vaart and Wellner (1996) suggests that forany r ∈ [1 , ∞ ), we havesup Q N ( F , (cid:107) · (cid:107) Q,r , ε (cid:107) F (cid:107) Q,r ) ≤ KV ( F )(16 e ) V ( F ) (cid:18) ε (cid:19) r ( V ( F ) − for all 0 < ε ≤ 1, where K is universal and the supremum is taken over all finite discrete probabilitymeasures.A related concept is VC type class. We say a set of functions F with envelope F is a VC typeclass with characteristics ( A, v ) if for some positive constants A, v ,sup Q N ( F , (cid:107) · (cid:107) Q, , ε (cid:107) F (cid:107) Q, ) ≤ (cid:18) Aε (cid:19) v , (B.1)where the supremum is taken over all finite discrete measures on ( S, S ).The following restates Lemma A.2 in Ghosal, Sen, and van der Vaart (2000), a useful result forconsidering uniform covering numbers for conditional expectations of class of functions. Lemma 1. Let H be a class of functions h : X × Y → R with envelope H and R a fixed probabilitymeasure on Y . For a given h ∈ H , let h : X → R be h = (cid:82) h ( x, y ) dR ( y ) . Set H = { h : h ∈ H} .Note that H is an envelope of H . Then, for any r, s ≥ , ε ∈ (0 , , sup Q N ( H , (cid:107) · (cid:107) Q,r , ε (cid:107) H (cid:107) Q,r ) ≤ sup Q N ( H , (cid:107) · (cid:107) Q × R,s , ε r (cid:107) H (cid:107) Q × R,s ) . Appendix C. Auxiliary lemmas Lemma 2. Suppose K is symmetric, (cid:107) K (cid:107) ∞ = O (1) , nh n → ∞ , and f is at least twice con-tinuously differentiable in an open neighbourhood of x and (cid:107) f (cid:48)(cid:48) (cid:107) X = O (1) , then it holds that max ≤ i ≤ n | V i ( θ ) | = o P ( n / ) .Proof. Notice that a direct calculation yields that S ( θ ) = 2 n ( n − (cid:88) ( i,j ) ∈ I n K ij,n − θ = 2 n ( n − (cid:88) i Under the same assumptions as in Lemma 2 and suppose that n − (cid:80) ni =1 V i ( θ ) = O P (1) ,we have n − (cid:80) ni =1 | V i ( θ ) | = o P ( n / ) .Proof. It follows from1 n n (cid:88) i =1 | V i ( θ ) | ≤ max ≤ i ≤ n | V i ( θ ) | · n n (cid:88) i =1 V i ( θ ) = o P ( n / ) · O P (1) , where the last equality follows from Lemma 2. (cid:3) emma 4. Suppose assumptions of Theorem 2 are satisfied, then (cid:98) λ = o P ( n − / ) Proof of Lemma 4. It essentially follows from the same argument as in the proof of Owen (1990,Equation (2.14)). The first-order condition yields0 = 1 n n (cid:88) i =1 V i ( θ )1 + (cid:98) λV i ( θ ) . Denote (cid:98) λ = ρθ , where ρ ≥ | θ | = 1. Then0 = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 V i ( θ )1 + ρθV i ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ θn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) i =1 V i ( θ ) − ρ n (cid:88) i =1 θV i ( θ ) ρV i ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ρ n (cid:80) ni =1 V i ( θ ) ρ max ≤ i ≤ n | V i ( θ ) | − | S ( θ ) | . Note that | S ( θ ) | = O P ( n − / ) following the uniform convergence rate from Theorem 1. Further, n − (cid:80) ni =1 V i ( θ ) > f ( x ) > ≤ i ≤ n | V i ( θ ) | = o P ( n − / ) following Lemma 2. Therefore, ρ = | (cid:98) λ | = o P ( n − / ) (cid:3) Appendix D. Proofs of the main results Throughout this appendix, for notational convenience, for any i > j , let us define X ij := X ji .D.1. Proof of Theorem 1. Proof. Note that (cid:98) f n ( x ) − f ( x ) = ( E [ (cid:98) f n ( x )] + f ( x )) + ( (cid:98) f n ( x ) − E [ (cid:98) f n ( x )]). The bound for the non-stochastic component follows from the identical distributions of X ij and a simple calculation E [ (cid:98) f n ( x )] = E (cid:20) h n K (cid:18) X − xh n (cid:19)(cid:21) = (cid:90) K ( u ) duf ( x − uh n ) du = f ( x ) + h n B ( x ) + o ( h n ) . where B ( x ) := 2 − f (cid:48)(cid:48) ( x ) (cid:82) u K ( u ) du . Thus Bias( (cid:98) f n ( x )) := E [ (cid:98) f n ( x ) − f ( x )] = h n B ( x ) + o ( h n ) = O ( h n ).We shall now consider the stochastic component. For each n ∈ N , write G n = { x (cid:48) (cid:55)→ K (( x − x (cid:48) ) /h n ) : x ∈ R } . Then, following the bias calculation above, one has sup g ∈G n E [ g ( X )] ≤(cid:107) f (cid:107) X h n + O ( h n ) = O ( h n ). and for each x ∈ R , there exists a g ∈ G n such that (cid:98) f n ( x ) = (cid:18) n (cid:19) − (cid:88) ( i,j ) ∈ I n h n · g ( X ij ) . enote g ( · ) = E [ g ( X ) | U = · ]. The decomposition in Graham et al. (2019) (Equations (9)-(11))yields (cid:18) n (cid:19) − (cid:88) ( i,j ) ∈ I n h n · ( g ( X ij ) − E [ g ( X )])= 2 n n (cid:88) i =1 h n ( g ( U i ) − E [ g ( X )])+ 2 n ( n − (cid:88) ( i,j ) ∈ I n h n ( E [ g ( X ij ) | U i , U j ] − g ( U i ) − g ( U j ) + E [ g ( X )])+ 2 n ( n − (cid:88) ( i,j ) ∈ I n h n ( g ( X ij ) − E [ g ( X ij ) | U i , U j ])= 1 n n (cid:88) i =1 L i ( g ) + 2 n ( n − (cid:88) ( i,j ) ∈ I n W ij ( g ) + 2 n ( n − (cid:88) ( i,j ) ∈ I n R ij ( g ) . (D.1)Note that each of the three terms on the right hand side are centred. It suffices to bound eachterms on the right hand side of (D.1).We shall first obtain some preliminary results for the uniform covering numbers of the classesof functions in these terms. Since K is of bounded variation, Proposition 3.6.12 in Gin´e and Nickl(2016) implies that G = { ( u, v, w ) (cid:55)→ K ( t f ( u, v, w ) − s ) : t ≥ , s ∈ R } ⊃ G n is of VC type class offunctions with envelope (cid:107) K (cid:107) ∞ and characteristics ( A, v ) for some A > v ≥ K . Further, Lemma 1 implies that G = { u (cid:55)→ E [ K ( t f ( u, U , U ) − s ) | U = u ] , t ≥ , s ∈ R }G = { u (cid:55)→ E [ K ( t f ( U , u, U ) − s ) | U = u ] , t ≥ , s ∈ R }G = { ( u, v ) (cid:55)→ E [ K ( t f ( u, v, U ) − s ) | U = u, U = v ] , t ≥ , s ∈ R } are of VC type with characteristics (4 √ A, v ) and envelope (cid:107) K (cid:107) ∞ . Let H n := { /h n } , a setconsisting of one single function. Corollary 7 in Kato (2019) implies that for each (cid:96) = 2 , , ε ∈ (0 , Q N ( H n · G (cid:96) , (cid:107) · (cid:107) Q, , ε (cid:107) h − n (cid:107) K (cid:107) ∞ (cid:107) Q, ) ≤ sup Q N ( H n , (cid:107) · (cid:107) Q, , εh − n ) (cid:124) (cid:123)(cid:122) (cid:125) = |H n | =1 · sup Q N ( G (cid:96) , (cid:107) · (cid:107) Q, , ε (cid:107) K (cid:107) ∞ ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:46) ( √ A/ε ) v = ( √ A/ε ) v = O (1) . Observe that for some M < ∞ , it holds for all u ∈ [0 , 1] that E [ h − n K (( x − X ij ) /h n ) | U i = u ] = (cid:90) K ( u ) f X | U ( x − h n s | u ) ds ≤ (cid:107) f X | U (cid:107) X × [0 , + O ( h n ) ≤ M, with a similar equality holding for E [ h − n K (( x − X ij ) /h n ) | U i = u, U j = v ]. We shall replace theenvelope h − n (cid:107) K (cid:107) ∞ with M . Notice that for any (cid:15) ∈ (0 , ε = (cid:15)h n M/ (cid:107) K (cid:107) ∞ , for n large enough such that ε ≤ 1, it holds thatsup Q N ( H n · G (cid:96) , (cid:107) · (cid:107) Q, , (cid:15)M ) = sup Q N ( H n · G (cid:96) , (cid:107) · (cid:107) Q, , εh − n · (cid:107) K (cid:107) ∞ ) = ( √ A/ε ) v = O (1) . hus we have, by Corollary 7 in Kato (2019),sup Q N ( h − n ( G − G − G − E [ g ( X )]) , (cid:107) · (cid:107) Q, , √ (cid:15)M ) (cid:46) ( √ A/ε ) v = O (1) . Now, coming back to the second term of (D.1), it consists of a centred, completely degenerate U -process (see Section 3.5 in de la Pe˜na and Gin´e (1999) for the definition). Hence by using theabove result of sup Q N ( h − n ( G − G − G − E [ g ( X )]) , (cid:107) · (cid:107) Q, , √ (cid:15)M ) = O (1) for any ε ∈ (0 , E sup g ∈G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( n − (cid:88) ≤ i This provides the bound for the first term of (D.1).For the third term of (D.1), note that conditional on ( U i ) ni =1 , ( h n · R ij ) ( i,j ) ∈ I n are centred andi.i.d. Thus similar entropy calculations and an application of Corollary 5.1 in Chernozhukov et al.(2014) yields1 h n · E sup g ∈G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( n − (cid:88) ( i,j ) ∈ I n h n R ij ( g ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | ( U i ) ni =1 (cid:46) σ U ( g ) (cid:115) v log( A (cid:107) K (cid:107) ∞ /σ U ( g )) n h n + v log( A (cid:107) K (cid:107) ∞ /σ U ( g )) (cid:107) K (cid:107) ∞ n h n . where σ U ( g ) := sup g ∈G n E [( h n R ij ( g )) | ( U i ) ni =1 ] = O ( h n ). By Jensen’s inequality and Fubini’stheorem, we have E sup g ∈G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ( n − (cid:88) ( i,j ) ∈ I n R ij ( g ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) (cid:115) v log( A (cid:107) K (cid:107) ∞ /h n ) n h n + v log( A (cid:107) K (cid:107) ∞ /h n ) (cid:107) K (cid:107) ∞ n h n . This shows the convergence rate of the stochastic component. (cid:3) D.2. Proof of Theorem 2. Proof. In this proof, we write L i , R ij and W ij for L i ( g ), R ij ( g ) and W ij ( g ) in Equation (D.1) where g corresponds to design point x . Also let us denote κ = nh n ,Ω ( x ) = Var (cid:0) f X | U ( x | U ) (cid:1) , Ω ( x ) = f ( x ) (cid:90) K ( u ) du. The first part of this proof follows a standard argument for empirical likelihood. For (cid:98) λ :=argsup λ (cid:80) ni =1 log(1 + λV i ( θ )), the first-order condition and the algebraic fact that (1 + x ) − =1 − x + x (1 + x ) − yield0 = 1 n n (cid:88) i =1 V i ( θ )1 + (cid:98) λV i ( θ ) = 1 n n (cid:88) i =1 V i ( θ ) − n n (cid:88) i =1 V i ( θ ) (cid:98) λ + 1 n n (cid:88) i =1 V i ( θ ) (cid:98) λ (cid:98) λV i ( θ ) . emmas 3 and 4 imply that (cid:98) λ = (cid:80) ni =1 V i ( θ ) (cid:80) ni =1 V i ( θ ) + o P ( n − / ) . Thus a Taylor expansion of log(1 + (cid:98) λV i ( θ )) at 1 gives2 n (cid:88) i =1 log(1 + (cid:98) λV i ( θ )) = 2 n (cid:88) i =1 (cid:18)(cid:98) λV i ( θ ) − (cid:110)(cid:98) λV i ( θ ) (cid:111) (cid:19) + o P (1) = (cid:8) n (cid:80) ni =1 V i ( θ ) (cid:9) n (cid:80) ni =1 V i ( θ ) + o P (1) . Note that Var( (cid:98) θ ) = 4Ω ( x ) /n + Ω ( x ) /n h n + O ( n − ) + O ( h n /n ) following Graham et al. (2019,pp. 12). Let σ = 4Ω ( x ) + κ − Ω ( x ). Notice that f ( x ) > > σ > 0. Itwill be shown 1 √ nσ n (cid:88) i =1 V i ( θ ) d → N (0 , . To see this, note that 1 n n (cid:88) i =1 V i ( θ ) = 1 n n (cid:88) i =1 (cid:16) nS ( θ ) − ( n − S ( i ) ( θ ) (cid:17) = S ( θ )using the fact that1 n n (cid:88) i =1 S ( i ) ( θ ) = 2 n ( n − n − n (cid:88) i =1 n − (cid:88) ι =1 (cid:88) j>ι K ιj,n − (cid:88) j>i K ij,n − (cid:88) ιk,l (cid:54) = i ( W kl + R kl ) + Bias( (cid:98) θ ( i ) ) . (D.3) n addition, notice that (cid:80) k (cid:54) = i (cid:80) l>k,l (cid:54) = i R kl = (cid:80) n − k =1 (cid:80) nl = k +1 R kl − (cid:80) l>i R il − (cid:80) k
12 ( R + R ) (cid:21) + o P (1) = 2 nσ Var( R ) + o P (1) , following the WLLN for triangular arrays, the conditional independence (on ( U i ) ≤ i ≤ n ) of ( R ij ) ( i,j ) ∈ I n ,the identical distribution of ( R ij ) ( i,j ) ∈ I n and the fact that Var( Y ) = E [( Y i − Y j ) / 2] for i.i.d. Y i ’s.Hence, under non-degeneracy, we have1 nσ n (cid:88) i =1 V i ( θ ) P → ( x ) σ → nσ n (cid:88) i =1 V i ( θ ) P → σ (cid:8) ( x ) + 2 κ − Ω ( x ) (cid:9) (cid:54)→ . We now consider modified JEL. Notice that1 nσ n (cid:88) i =1 V mi ( θ ) P → σ − (cid:8) ( x ) + 2 κ − Ω ( x ) (cid:9) , following a similar argument as above as well as the consistency of (cid:98) θ . It now suffices to show that1 √ nσ n (cid:88) i =1 V mi ( θ ) d → N (cid:0) , σ − (cid:8) ( x ) + 2 κ − Ω ( x ) (cid:9)(cid:1) . s (cid:80) ni =1 V i ( (cid:98) θ ) = 0, it holds that n − / (cid:80) ni =1 V mi ( θ ) = ΓΓ − m n − / (cid:80) ni =1 V i ( θ ). We have from above, n − / σ − (cid:80) ni =1 V i ( θ ) d → N (0 , − m = (cid:40) n − σ − (cid:80) ni =1 V i ( (cid:98) θ ) n − σ − (cid:80) ni =1 V i ( (cid:98) θ ) − n − σ − (cid:80) n − i =1 (cid:80) nj = i +1 Q ij (cid:41) / . By consistency of (cid:98) θ and the convergence of n − σ − (cid:80) ni =1 V i ( θ ) , we have n − σ − (cid:80) ni =1 V i ( (cid:98) θ ) P → σ − { ( x ) + 2 κ − Ω ( x ) } . We now verify that1 nσ n − (cid:88) i =1 n (cid:88) j = i +1 Q ij P → σ − κ − Ω ( x ) . To see this, observe that( n − S ( i ) ( θ ) = 2( n − (cid:32) (cid:88) k (cid:54) = i L k + n − (cid:88) k =1 n (cid:88) l = i +1 ( W kl + R kl ) − n (cid:88) l = i +1 ( W il + R il ) − i − (cid:88) k =1 ( W ki + R ki ) (cid:33) + Bias( (cid:98) θ ( i ) ) , ( n − S ( i,j ) ( θ ) = 2( n − (cid:32) (cid:88) k (cid:54) = i,j L k + n − (cid:88) k =1 n (cid:88) l = i +1 ( W kl + R kl ) − n (cid:88) l = j +1 ( W jl + R jl ) − j − (cid:88) k =1 ( W kj + R kj ) − n (cid:88) l = i +1 ( W il + R il ) − i − (cid:88) k =1 ( W ki + R ki ) + ( W ij + R ij ) (cid:33) + Bias( (cid:98) θ ( i,j ) ) . Observe that Bias( (cid:98) θ ) = Bias( (cid:98) θ ( i ) ) = Bias( (cid:98) θ ( j ) ) = Bias( (cid:98) θ ( i,j ) ) as they are estimated using the samebandwidth h n . Hence, as W ij terms are of smaller order asymptotically, we have Q ij = 1 n − n − n − n − (cid:88) k =1 n (cid:88) l = k +1 R kl − n − (cid:88) l>i R il + (cid:88) kj R jl + (cid:88) k D.3. Proof of Theorem 3. Proof. In this proof, let us abbreviate X I n = ( X ij ) ( i,j ) ∈ ( i,j ) ∈ I n . Define γ = p/ (1 − p ). Denote N =: pn ( n − / N =: p ( n − n − / N =: ( n − n − / 2. First notice that (cid:98) p/p P → { n ( n − } − / . Hence (cid:98) N /N P → (cid:98) N /N P → (cid:98) N /N P → hus if we define (cid:101) S ( θ ) = 1 N (cid:88) ( i,j ) ∈ I n h n Z ij K (cid:18) x − X ij h n (cid:19) − θ, (cid:101) S ( k ) ( θ ) = 1 N (cid:88) ( i,j ) ∈ I n ( k ) h n Z ij K (cid:18) x − X ij h n (cid:19) − θ, (cid:101) S ( k,l ) ( θ ) = 1 N (cid:88) ( i,j ) ∈ I n ( k,l ) h n Z ij K (cid:18) x − X ij h n (cid:19) − θ Then for all i = 1 , ..., n , it holds uniformly that (cid:101) S ( θ ) = (cid:98) S ( θ ) + o P (1)and similar results hold for (cid:98) S ( k ) ( θ ) and (cid:98) S ( k,l ) ( θ ). Hence we have, uniformly in i = 1 , ..., n , that (cid:101) V i ( θ ) := n (cid:101) S ( θ ) − ( n − (cid:101) S ( i ) ( θ ) = (cid:98) V i ( θ ) + o P (1) . Observe that we have1 √ n n (cid:88) i =1 (cid:101) V i ( θ ) = 1 √ n n (cid:88) i =1 (cid:16) n (cid:101) S ( θ ) − ( n − (cid:101) S ( i ) ( θ ) (cid:17) = √ n (cid:101) S ( θ )using the fact n − (cid:80) ni =1 (cid:101) S ( i ) ( θ ) = (cid:101) S ( θ ), which follows Equation (D.2) with Z ij K ij,n in place of K ij,n .Now, the same argument from the first part of the Proof of Theorem 2 holds for (cid:101) V i ( θ ) in placeof V i ( θ ) and thus we have2 n (cid:88) i =1 log(1 + (cid:98) λ (cid:101) V i ( θ )) = (cid:110) n (cid:80) ni =1 (cid:101) V i ( θ ) (cid:111) n (cid:80) ni =1 (cid:101) V i ( θ ) + o P (1) . Denote (cid:101) σ = σ + γ − κ − Ω ( x ) = Var( (cid:98) θ inc ) + o P (1), where σ = 4Ω ( x ) + κ − Ω ( x ) is the sameas in the proof for Theorem 2. To this end, it suffices to verify that for Σ := (cid:101) σ − { ( x ) +2 κ − Ω ( x ) } + (cid:101) σ − γ − κ − Ω ( x ), it holds that1 √ n (cid:101) σ n (cid:88) i =1 (cid:101) V mi ( θ ) d → N (0 , Σ ) , n (cid:101) σ n (cid:88) i =1 (cid:101) V mi ( θ ) P → Σ . Notice that we can decompose1 √ n (cid:101) σ n (cid:88) i =1 (cid:101) V i ( θ ) = √ n (cid:101) σ − (cid:101) S ( θ ) = √ nN (cid:101) σ (cid:88) ( i,j ) ∈ I n h n Z ij K (cid:18) x − X ij h n (cid:19) − θ + √ nN (cid:101) σ (cid:88) ( i,j ) ∈ I n ( Z ij − p ) K ij,n = √ n (cid:101) σ − { S ( θ ) + T ( θ ) } , where S ( θ ) is the same as in the complete dyadic data case and T ( θ ) = N − (cid:80) ( i,j ) ∈ I n ( Z ij − p ) K ij,n .From the proof of Theorem 2, we have √ nσ − S ( θ ) d → N (0 , X I n , √ n (cid:101) σ − T ( θ )consists of a sum of independent random variable with mean 0 and conditional varianceVar (cid:0) √ n (cid:101) σ − T ( θ ) | X I n (cid:1) = (1 − p ) p ( n − (cid:101) σ (cid:88) ( i,j ) ∈ I n K ij,n . Observe that by the law of total variance, we haveVar( T ( θ )) = (1 − p ) p n h n Ω ( x ) + o (1) . n addition, one can verify the conditional Lyapunov’s condition1 { Var( T ( θ )) } / E (cid:104) | T ( θ ) | | X I n (cid:105) (cid:46) { Var( T ( θ )) } / pp n (cid:88) ( i,j ) ∈ I n K ij,n . (D.4)Notice that if n ph n → ∞ , the right hand side of (D.4) is of order o P (1) since E { Var( T ( θ )) } / pp n (cid:88) ( i,j ) ∈ I n K ij,n (cid:46) np / h / n = o (1) . Thus an application of Lyapunov’s CLT yields that with probability 1 − o (1), it holds that { Var( T ( θ )) } − / T ( θ ) d | X → N , (cid:8) Var( T ( θ )) pn (cid:9) − (1 − p ) (cid:88) ( i,j ) ∈ I n K ij,n , where the convergence in distribution is with respect to the law of ( Z ij ) ( i,j ) ∈ I n . The variancecalculation above and conditional convergence in distribution then implies unconditional conver-gence in distribution { Var( T ( θ )) } − / T ( θ ) d → N (0 , S ( θ ) and T ( θ ) are uncorrelated and thus asymptotically independent as they are bothasymptotically normal. Hence we have √ n (cid:101) σ − (cid:101) S ( θ ) = √ n (cid:101) σ − { S ( θ ) + T ( θ ) } d → N (cid:0) , (cid:101) σ − σ + (cid:101) σ − γ − κ − Ω ( x ) (cid:1) = N (0 , . Now, we claim 1 n (cid:101) σ n (cid:88) i =1 (cid:101) V i ( θ ) P → (cid:101) σ − { ( x ) + 2 κ − Ω ( x ) } + (cid:101) σ − γ − κ − Ω ( x ) . Define T ( i ) ( θ ) = ( N ( i )1 ) − (cid:80) ( k,l ) ∈ I ( i ) n ( Z kl − p ) K kl,n , the leave-one-out version of T . Note that (cid:101) V i = (cid:101) S − ( n − (cid:101) S ( i ) − (cid:101) S ). Then1 n (cid:101) σ n (cid:88) i =1 (cid:101) V i ( θ ) = (cid:101) σ − (cid:32) (cid:101) S ( θ ) − n − (cid:101) S ( θ ) 1 n n (cid:88) i =1 { (cid:101) S ( i ) ( θ ) − (cid:101) S ( θ ) } + ( n − n n (cid:88) i =1 { (cid:101) S ( i ) ( θ ) − (cid:101) S ( θ ) } (cid:33) = (cid:101) σ − ( M − M + M ) . bserve that M = o P (1) following the uniform convergence rate from Theorem 1 and M = 0using Equation (D.2). Similar to the calculation of term T in the Proof for Theorem 2, (cid:101) σ − M = ( n − n (cid:101) σ n (cid:88) i =1 { (cid:101) S ( i ) ( θ ) − (cid:101) S ( θ ) } = ( n − n (cid:101) σ n (cid:88) i =1 (cid:40) (cid:101) S ( i ) ( θ ) − n n (cid:88) i =1 (cid:101) S ( i ) ( θ ) (cid:41) = ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) (cid:101) S ( i ) ( θ ) − (cid:101) S ( j ) ( θ ) (cid:111) = ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110)(cid:16) S ( i ) ( θ ) + T ( i ) ( θ ) (cid:17) − (cid:16) S ( j ) ( θ ) + T ( j ) ( θ ) (cid:17)(cid:111) = ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) S ( i ) ( θ ) − S ( j ) ( θ ) (cid:111) + ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) T ( i ) ( θ ) − T ( j ) ( θ ) (cid:111) + ( n − n (cid:101) σ n − (cid:88) i =1 n (cid:88) j = i +1 (cid:110) S ( i ) ( θ ) − S ( j ) ( θ ) (cid:111) (cid:110) T ( i ) ( θ ) − T ( j ) ( θ ) (cid:111) P → (cid:101) σ − { ( x ) + 2 κ − Ω ( x ) } + (cid:101) σ − γ − κ − Ω ( x ) = Σ . where the convergence in the last line is a direct result of the WLLN for triangular arrays and theconditional independence (given X I n ) between S ( i ) ( θ ) and T ( i ) ( θ ). The same proof strategy yieldsthat n − (cid:101) σ − (cid:80) ni =1 (cid:101) V mi ( θ ) P → Σ .It remains to show that 1 √ n (cid:101) σ n (cid:88) i =1 (cid:101) V mi ( θ ) d → N (0 , Σ ) . Note that (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) = 0 and hence1 √ n (cid:101) σ n (cid:88) i =1 (cid:98) V mi ( θ ) = (cid:40) n − (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) n − (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) − n − (cid:101) σ − (cid:80) n − i =1 (cid:80) nj = i +1 (cid:98) Q ij (cid:41) / · √ n (cid:101) σ n (cid:88) i =1 (cid:98) V i ( θ ) . We have from above, n − / (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( θ ) d → N (0 , (cid:98) θ inc , we have n − (cid:101) σ − (cid:80) ni =1 (cid:98) V i ( (cid:98) θ inc ) P → Σ . It suffices to show that the denominator in the square root converges to unity asymptotically.Observe that (cid:98) Q ij = 1 p ( n − (cid:40) n − n − n − (cid:88) k =1 (cid:88) l>k Z kl R kl − n − (cid:88) l (cid:54) = i Z il R il + (cid:88) l (cid:54) = j Z jl R jl + 2 Z ij R ij (cid:41) + o P (1) . Therefore, by the identical distribution and conditional independence (given ( U i ) ni =1 ) of ( Z ij R ij ) ( i,j ) ∈ I n ,using the WLLN for triangular arrays following the same argument as in the proof of Theorem 2,we have n − (cid:101) σ − (cid:80) n − i =1 (cid:80) nj = i +1 (cid:98) Q ij P → (cid:101) σ − κ − Ω ( x ). This concludes the proof. (cid:3) H. D. Chiang) Department of Economics, University of Wisconsin-Madison, William H. Sewell So-cial Science Building, 1180 Observatory Drive, Madison, WI 53706, USA. Email address : [email protected] (B. Y. Tan) Department of Economics, Vanderbilt University, VU Station B, Box Email address : [email protected]@vanderbilt.edu