[PDF] Evaluating Fairness in the Presence of Spatial Autocorrelation

Abstract

In spite of considerable practical importance, current algorithmic fairness literature lacks in technical methods to account for underlying geographic dependency while evaluating or mitigating fairness issues for spatial data. We initiate the study of spatial fairness in this paper, taking the first step towards formalizing this line of quantitative methods. Fairness considerations for spatial data often get confounded by the underlying spatial autocorrelation. We propose hypothesis testing methodology to detect the presence and strength of this effect, then mitigate it using a spatial filtering-based approach -- in order to enable application of existing bias detection metrics. We evaluate our proposed methodology through numerical experiments on real and synthetic datasets, demonstrating that in presence of several types of confounding effects due to the underlying spatial structure our testing methods perform well in maintaining low type-II errors and nominal type-I errors.

Full PDF

EEvaluating Fairness in the Presence of Spatial Autocorrelation

Cheryl Flynn

AT&T Labs - ResearchBedminster, NJ, USA [email protected]

Subhabrata Majumdar

AT&T Chief Data OﬃceNew York City, NY, USA [email protected]

Ritwik Mitra

AT&T Chief Data OﬃceBedminster, NJ, USA [email protected]

Fairness in spatial distribution problems has been stud-ied relatively less compared to other areas of automateddecision-making. Systemic and (un)intentional biasesmay aﬀect spatial distribution of resources, impactingthe accessibility of crucial private and public goods. Fair-ness concerns in bike-share problems is well documented.Ogilvie & Goodman (2012) noted inequities in access-ing London’s Barclays Cycle Hire (BCH) scheme; maleand privileged individuals tended to access it more. Col-ley et al. (2017) found that the augmented reality gamePok´emon GO disadvantaged areas with higher minoritypopulation. With the advent of 5G era in telecommu-nications, the need for equitable access to high-speedinternet is now tied to fair spatial allocation.We focus on spatial fairness through the lens of pre-existing and non-uniform spatial concentrations of de-mographic groups, which may cause sensitive demo-graphic features to be spatially autocorrelated. Owing tothis confounding eﬀect, deployment of a spatially-basedservice may appear to be demographically biased evenwhen the deployment was done independent of the de-mographic distribution.To address this phenomenon, we propose hypothesistests to ﬁrst ascertain the presence and strength of acommon spatial factor autocorrelated with both the sen-sitive feature and the spatially deployed service. We thenuse spatial ﬁltering to control for spatial autocorrelationbefore testing for presence of potential bias.

Motivating Example

Consider n locations each associ-ated with a continuous response Y and sensitive feature S , with y = ( y i ) ni =1 , s = ( s i ) ni =1 denoting their observa-tions associated with the respective location units. Alsoconsider the symmetric weight matrix W = ( w ij ) ∈ R n × n where w ij is a function of the distance betweenlocations i, j . If y and s are both positively spatiallyautocorrelated based on W , then y is more likely to becorrelated with the sensitive attribute due to the natureof the data. This suggests that bias detection methodsfor independent data may have a higher bias detectionrate when applied to spatially autocorrelated data.As a motivating example, consider the 2018 5-yearACS Census data for Cook County, IL, USA. We con-struct W using the inverse distances between centroidsof census tracts, and take the percent of populationthat is African American as s . The Moran’s I statis- tic (Moran, 1950) for s is signiﬁcant and suggests thatthere is considerable positive spatial autocorrelation. Figure 1: Boxplots ofKS statistics for uncor-related vs. spatially au-tocorrelated data.

We simulate x =( I n − ρ W ) − (cid:15), y = ( x > , (cid:15) ∼ N (0 , I n ), and comparefalse positive rates of theKolmogorov-Smirnov (KS)statistic for ρ = 0 (uncorre-lated) vs. ρ = 0 .

95 (strongpositive spatial autocorrela-tion) over 100 realizations(Figure 1). At siginﬁcancelevel α = 0 .

05, the false pos-itive rate is 0 .

75 for ρ = 0 . .

03 for ρ = 0. This highlights thepotential impact of spatial autocorrelation on biasdetection metrics. We aim to answer three questions: (Q1) Are both Y and S signiﬁcantly associated with the common weight ma-trix W ? (Q2) Are the strength of associations similar?(Q3) Can we adjust for spatial association to measurethe true degree of association between Y and S ? Testing Common Spatial Autocorrelation

To answer Q1and Q2, we formulate hypothesis tests based on com-bining the Moran’s I statistics for

Y, S (denoted by I y , I s , respectively) using Intersection-Union (IU) prin-ciple (Berger and Hsu, 1996). We use the concept of Pivotal Parametric Products or P3 (SenGupta, 2007) todesign combined test statistics that test for the presenceand equivalence of the common association of Y and S with W : T = min( | I y | , | I j | ) , T = ( I y − I j ) . Owingto the diﬃculty of combining individual tests to get acomposite test (Berger and Hsu, 1996), we use a non-parametric bootstrap procedure to obtain the p -values. Adjusting for Spatial Autocorrelation

To address Q3,we use spatial ﬁltering (Dray et al., 2006; Griﬃth, 2004)to learn features that capture the spatial structure in Y and S . We then use these features to adjust the biasdetection procedure. If Y and S are both continuous,then the bias metric is computed using residuals fromthe regression of Y and S on their respective spatial fea-tures, and compared against standard thresholds for the a r X i v : . [ s t a t . A P ] J a n .000.250.500.751.000.00 0.25 0.50 0.75 ρ P o w e r ρ P o w e r ρ F a l s e P o s i t i v e R a t e ρ Figure 2: (Left to right) Powers for T , and T , and KS testfalse positive rate. Dotted lines indicate size α = 0 . . metric. If one or both of Y and S are discrete, then weuse the spatial features to estimate the mean parametervector(s), then use it to simulate an empirical distribu-tion for the bias metric—under the assumption that theonly bias present in the data is due to W . The unad-justed bias metric can then be compared against a criti-cal value from this empirical distribitution to determineif there is additional bias in the data. Our approach isthus bias-metric agnostic, and can handle both discreteand continuous data. Data

We perform experiments on the Census tract andCensus block group locations for the 2018 5-year ACSSurvey data in Cook County, IL. We construct the weightmatrix W using the inverse of the distance betweenthe centroids of the geometries considered. We exper-iment with discrete and continuous spatially autocorre-lated samples of ( Y, S ) at these locations, with varyinglevels of spatial autocorrelation and 100 realizations.

Testing for spatial associations

We consider three sce-narios: (1) Y and S are independently associated with W , (2) they are associated independently and one orboth are spatially autocorrelated, (3) Y is associatedwith a non-sensitive attribute, say X , but not S —however X and S are spatially autocorrelated with W .Figure 2 reports results for case 3. We generate spa-tially lagged observations as s = ( I n − ρ W ) − (cid:15) , x =( I n − ρ W ) − (cid:15) , y = 5 x + (cid:15) ; (cid:15) , (cid:15) , (cid:15) ∼ N (0 , I n ). We re-peat this for ρ = 0 . , . , . , ρ = 0 , . , . . . , .

9, using1000 permutations to generate the null distributions forcomputing empirical powers. As ρ , ρ increases, T isable to detect the existence of non-zero spatial autocor-relation with higher power. For T , note that rejectionimplies inferring the equality of ρ across y and A . Weare able to infer this with higher power at ρ = ρ . Val-ues at ρ = 0 indicates that tests are well-calibrated, i.e.maintain nominal size in the absence of any signal. Adjusting for Spatial Autocorrelation

We generate spa-tially lagged observations s = ( I n − ρ W ) − (cid:15) , x =( I n − ρ W ) − (cid:15) , y = ( x > (cid:15) , (cid:15) ∼ N n ( , I n )and ρ , ρ ∈ (0 , α = 0 .

05 to test for bias in y with respect to s .As seen in the rightmost plot in Figure 2, standard biasdetection procedures perform as expected when there islow to moderate positive spatial autocorrelation presentin the data, but have a higher than expected bias de- tection rate if both variables exhibit a moderate to highdegree of positive spatial autocorrelation. In such situa-tions, the false positive rate comes back below nominallevel ( ρ = 0 . When bias is detected even after adjusting for spatial au-tocorrelation, it may be necessary to consider a bias miti-gation strategy. To our knowledge, mitigation strategiesfor spatial data have not been proposed. However, itis possible that existing methods can be adjusted forspatial data by including the spatial features learnedduring the bias detection stage. For example, an in-processing mitigation strategy such as Adversarial Debi-asing (Zhang et al., 2018) may be used, with the spatialfeatures included as non-sensitive attributes.There may also be cases where bias results from theunderlying spatial autocorrelation in the data, but it isstill necessary to implement a mitigation strategy. Inthese cases, mitigation strategies could include expand-ing, shifting/rotating, or splitting targeted areas in amanner that could still preserve the spatial nature of theresponse variable while reducing the amount of bias. Inour talk, we discuss these and other mitigation strategiesfor spatial data as interesting areas for future research.

References [Berger and Hsu1996] R. L. Berger and J. C. Hsu. 1996.Bioequivalence trials, intersection-union tests andequivalence conﬁdence sets.

Stat. Sci. , 11:283–319.[Colley2017] Ashley et al. Colley. 2017. The geographyof pok´emon go: beneﬁcial and problematic eﬀects onplaces and movement. In

Proceedings of the 2017 CHIConference on Human Factors in Computing Systems ,pages 1179–1192.[Dray et al.2006] St´ephane Dray, Pierre Legendre, andPedro R. Peres-Neto. 2006. Spatial modelling: a com-prehensive framework for principal coordinate analysisof neighbour matrices (pcnm).

Ecological Modelling ,196(3):483 – 493.[Griﬃth2004] D. Griﬃth. 2004. A spatial ﬁltering speci-ﬁcation for the autologistic model.

Environment andPlanning A , 36:1791 – 1811.[Moran1950] Patrick AP Moran. 1950. Notes on contin-uous stochastic phenomena.

Biometrika , 37(1/2):17–23.[Ogilvie and Goodman2012] Flora Ogilvie and AnnaGoodman. 2012. Inequalities in usage of a public bi-cycle sharing scheme: socio-demographic predictors ofuptake and usage of the london (uk) cycle hire scheme.

Preventive medicine , 55(1):40–45.[SenGupta2007] A. SenGupta. 2007. P approach to in-tersection–union testing of hypotheses. J. Stat. Plan.Inf. , 137:3753–3766.[Zhang et al.2018] B. Zhang, B. Lemoine, andM. Mitchell. 2018. Mitigating Unwanted Biaseswith Adversarial Learning. In