A Statistical Analysis of Noisy Crowdsourced Weather Data
SSubmitted to the Annals of Applied Statistics
A STATISTICAL ANALYSIS OF NOISY CROWDSOURCEDWEATHER DATA
By Arnab Chakraborty, Soumendra Nath Lahiri and Alyson Wilson
Department of Statistics, North Carolina State University
Spatial prediction of weather-elements like temperature, precip-itation, and barometric pressure are generally based on satellite im-agery or data collected at ground-stations. None of these data provideinformation at a more granular or “hyper-local” resolution. On theother hand, crowdsourced weather data, which are captured by sen-sors installed on mobile devices and gathered by weather-related mo-bile apps like
WeatherSignal and
AccuWeather , can serve as potentialdata sources for analyzing environmental processes at a hyper-localresolution. However, due to the low quality of the sensors and thenon-laboratory environment, the quality of the observations in crowd-sourced data is compromised. This paper describes methods to im-prove hyper-local spatial prediction using this varying-quality noisycrowdsourced information. We introduce a reliability metric, namelyVeracity Score (VS), to assess the quality of the crowdsourced obser-vations using a coarser, but high-quality, reference data. A VS-basedmethodology to analyze noisy spatial data is proposed and evaluatedthrough extensive simulations. The merits of the proposed approachare illustrated through case studies analyzing crowdsourced daily av-erage ambient temperature readings for one day in the contiguousUnited States.
1. Introduction.
In recent years there has been a proliferation of weather-related ap-plications for mobile devices such as cellphones, iPods, and laptops. These applicationsnot only provide service to the user but also collect and share spatial data on location,ambient temperature, barometric pressure, humidity, etc., captured by the small-scale sen-sors installed in the devices. Analyzing and understanding these crowdsourced data sets isbecoming an area of increasing interest.One use of the mobile sensor-generated data is to analyze and understand atmosphericprocesses at very fine spatial resolution. Most of the methodologies in literature for spa-tial prediction of weather elements are based on global images coming from satellites ormeasurements taken at meteorological stations on the ground (for example, see Thornton,Running and White 1997; Florio et al. 2004 etc.). But none of these sources are denseenough so that the variability of the process can be analyzed in ‘hyper-local’ regions, e.g.
Keywords and phrases: veracity score, geostatistics, robust kriging, hyper-local spatial prediction a r X i v : . [ s t a t . A P ] J un A. CHAKRABORTY ET AL. rectangular regions inside the population centers with each sides varying approximately inbetween 25 to 30 miles (0 . ◦ − . ◦ in latitude and longitude). For instance, the ground-stations are generally situated away from localities e.g. at airports or national parks etc.Hence, weather-related analysis solely based on ground-station data does not often providecorrect assessment of the variation of the underlying process in the localities. However,in disaster detection, traffic management, and many defense-related activities, predictionof the process in a very localized region (hyper-local) is often more important than theglobal imputation of the process over a bigger region. Crowdsourced data captured by mo-bile sensors can serve as a potential source in these scenarios, especially in regions wherethe ground weather stations are sparse but the population density and hence the densityof the mobile-devices like cellphones, iPads etc. is relatively high. Recently, a handful oforganizations are becoming interested in providing cost-effective hyper-local predictions ofweather using sensor-generated geographical information through weather-related mobileapps. For example, the global leader in weather information, AccuWeather, launched Ac-cUcast in 2015 (AccuWeather 2015), a feature that allows each user to share their localweather information as captured by the built-in mobile sensors. Other applications includeSunshine (Moynihan 2015) and Dark Sky (Dalton 2016), which turn each app-user into a“meteorological station” for gathering and sharing hyper-local weather information. Mobilesensor generated weather data are already being used in traffic management, fire detectionetc. In a recent article, Sosko and Dalyot (2017) have used a crowdsourced mobile-sensordata in forest fire detection to densify the static geo-sensor network (SGN), which is pri-marily comprised of meteorological stations with high-performance sensors. Though spatialprediction of daily weather is generally based on satellite imagery or data from weatherstations (Thornton, Running and White 1997, Vancutsem et al. 2010, Frei 2014), recentadvancement of weather-related mobile apps and the concurrent business interests callfor a new methodology that considers these crowdsourced weather data to generate moreaccurate weather prediction in hyper-local regions. In this article, we consider the dailyaverage ambient temperature process, and show that more efficient and reasonable pre-diction surfaces can be created in hyper-local regions with denser but noisy crowdsourceddata as compared to a global prediction surface obtained from high-quality but coarserground-station data.1.1. WeatherSignal and NOAA ground-station data.
We analyze a static crowdsourceddata set consisting of geo-coded daily average ambient temperature readings over the con-tinental United States on April 30, 2013. These data were gathered by a cellphone applica-tion named
WeatherSignal , available both for iOS and Android. In addition to providinginformation on current weather and forecasts, the app also gathers geographic and weatherinformation using cellphone sensors, leading to a huge amount of crowdsourced spatialweather data from all over the globe. The
WeatherSignal application is operated by an or-ganization named OpenSignal. Through the research partnership program of OpenSignal,
STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll Temp.(deg F)
Crowdsourced Ambient Temperature Data over USA (a) WeatherSignal (WS) data
Temp.(deg F)
NOAA Gr. St. Ambient Temperature Data over USA (b) NOAA station data llll lll llllllllllllllllllllll llllllllllllllllllllll lllll lllllllll l llll lll llllllllllllllllllllll l lllllllllllllllllllll lllll lllllllll l
Temp.(deg F)
Brooklyn, NY Crowdsourced data (c) WS, Brooklyn ll l llllllll lllllllllllllllllllllllllllllll l l lllllll lllllllllll llllllllllllllllll ll l llllllll lllllllllllllllllllllllllllllll l l lllllll lllllll lll l lllll lllllllllllll
Temp.(deg F)
LA Crowdsourced Data (d) WS, LA
Temp.(deg F)
Brooklyn, NY NOAA Gr. St. Data (e) NOAA, Brooklyn
Temp.(deg F)
Detroit, MI NOAA Gr. St. Data (f) NOAA, LA
Fig 1: Spatial plots of the crowdsourced and NOAA ground-station data. (c) - (f) showzoomed ‘hyper-local’ versions (each side of these regions vary from 25 to 30 miles approx-imately) of the crowdsourced WeatherSignal (c - d) and NOAA station data (e - f).we were provided real-time (in milliseconds) ambient temperature readings captured byvarious mobile phones for the above-mentioned day. For each spatial location, we havetemporally aggregated the temperature readings to the daily average by taking mean ofthe regionally estimated hourly temperatures throughout the day. The details of the aggre-gation are explained elaborately in Section A.1 in the supplementary material. After theaggregation, we have the crowdsourced daily average temperature readings at 1879 spatiallocations in the United States, as shown in Figure 1a. From the figure, it can be seen thatthe crowdsourced observations are clumped together in high-population density regions likeDetroit, Chicago, New York, and Los Angeles etc. In Figure 1c and 1d we show hyper-localversions of the WeatherSignal data for two nearly square hyper-local regions at Brooklyn,NY and Los Angeles, CA.Along with the crowdsourced data from the WeatherSignal app, we also have ground-station data on the daily average ambient temperature from the National Oceanic andAtmospheric Administration (NOAA). We used the Global Historical Climate NetworkDaily (GHCND) data access tool to retrieve the daily ambient temperature summaries forApril 30, 2013 from 2094 stations in the continental United States. We have plotted theground-station observations in Figure 1b.Comparing Figure 1a and Figure 1b, we can see that the NOAA ground-station data pro-
A. CHAKRABORTY ET AL.
Daily Avg. Ambient Temp. (deg F) c oun t Histogram of crowdsourced data: Brooklyn, NY
Daily Avg. Ambient Temp. (deg F) c oun t Histogram of crowdsourced data: LA
Fig 2: Empirical distribution of the crowdsourced average temperatures in the regions fromFigure 1 for Brooklyn, NY (left) and Los Angeles, CA (right). Blue vertical lines representthe average ground-station values in the considered regions.vides much more spatial coverage than the crowdsourced data in the entire United Statesor large parts of United States like east-coast, mid-west etc. are considered and hence forglobal modeling or building a global prediction surface of the ambient temperature, theground-station data is clearly a better choice. However, for hyper-local prediction of thespatial process, we believe that crowdsourced data has the potential to capture the localbehavior of the spatial process more accurately. For example, in Figure 1e and 1f we haveplotted the available ground-station observations in the same square neighborhoods as thecrowdsourced data in Figure 1c and 1d. In the area around Brooklyn, NY, there are ap-proximately 90 crowdsourced observations available, where as the number of ground-stationobservations is only one. Motivated by this observation, in this paper, we propose a methodto improve the accuracy of the hyper-local predictions using the available crowdsourcedinformation in addition to the ground-station data over a bigger surrounding.1.2.
The challenge in analyzing crowdsourced mobile-sensor data.
The challenge in ana-lyzing mobile sensor-generated crowdsourced data lies in the low quality and hence poorreliability of an unknown proportion of the data. When data are collected from mobileapplications, the readings are prone to contamination for various reasons. The inaccurateobservations can occur due external factors, low-resolution sensors, or a combination ofthese factors. For instance, the temperature readings can be affected by battery tempera-ture, whether the user is indoor or outdoor, the proximity of the device to a hot or coldobject, the heterogeneity of the sensors used by different devices, and many other unknownprocesses.To illustrate the varying quality of the observations in the WeatherSignal data, Figure 2shows the temperature distribution for the two hyper-local regions shown in Figure 1c and1d. The daily average temperature values in the crowdsourced data set vary from nearly60 ◦ F to 90 ◦ F in both of the hyper-local regions for the same day.
STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA These temperature distributions show the nature of the noise involved in the crowdsourceddata. Due to the factors associated with the data collection process, a portion of theobservations in the crowdsourced data are either contaminated or not representative of theambient temperature, which is the outdoor air temperature close to the earth’s surface.Such representativeness errors for weather data coming from meteorological stations havebeen considered previously by Lorenc (1986), Gandin (1988) and Lussana, Uboldi andSalvati (2010). Comparing the histograms with the single ground-station observation inboth the regions, we can see that although there are large deviations, a good proportion ofthe crowdsourced observations are ‘close’ to the corresponding ground-station observations(72 ◦ F in the Brooklyn and 70 ◦ F in LA), which are collected in laboratory environment withhigh-quality sensors maintaining World Meteorological Organization (WMO) standards.Building models based on the noisy crowdsourced data that ignore the reliability of thesensor-generated observations can lead to erroneous prediction. For instance, we used leave-one-out prediction of the observations in the regional block around Brooklyn (Figure 1c)using standard techniques of spatial analysis, with a reasonable mean and covariance model(discussed in Section 3.1), and the errors in the predictions ranged from -30 ◦ F to 40 ◦ F.Similar cross-validation approach has been previously used by Cressie (1993) and Lussana,Uboldi and Salvati (2010) to identify the ‘bad’ observations. These first-stage analysesmotivated us to take the quality of the observations in the WeatherSignal data into con-sideration. Lussana, Uboldi and Salvati (2010) proposed to remove observations for whichthe cross-validated prediction errors exceed some threshold. But, due to the inclusion ofthe corrupted observations at every iteration of the leave-one-out cross-validation, the pre-dictions are not guaranteed to be a good representation of the true value at that location.Moreover, the leave-one-out cross-validation approach being computationally expensive,the method proposed by Lussana, Uboldi and Salvati (2010) is not readily applicable forlarge crowdsourced weather data coming from mobile sensors. The ‘absurd’ observations,i.e. observations with high gross error (Lussana, Uboldi and Salvati, 2010), can be iden-tified using some other more scalable spatial outlier detection techniques (for example,see Chapter 1 of Cressie 1993; Harris et al. 2014 etc.) and thus, can be omitted from theanalysis. But, in that case, it is not straightforward how to address observations with smallto moderate measurement errors. For instance, using a too strict threshold on the mea-surement error may lead to deletion of significant number of observations, resulting in acomplete loss of information for specific locations.Hence, the new methodology should address the three following challenges. First, in ad-dition to just identifying high-noise observations, a continuous assessment of the veracityof all the observations in a geostatistical setting is needed. Second, the definition of verac-ity should take into account the behavior of the process in the study region so that the“misleading” observations can be detected. Third, the veracity assessment of the obser-vations should be incorporated into the subsequent analysis to allow for robust inferenceand efficient prediction. Though there are studies (for example, Allahbakhsh et al. 2013)
A. CHAKRABORTY ET AL. in the literature on quality assessment of crowdsourced data coming from volunteers orpaid participants, assessment of sensor-generated data quality is not common. Sosko andDalyot (2017) mention an elementary root mean squared error approach for accuracy mea-surement using a reference data set from Israeli Meteorological Stations. However, neitherof the above-mentioned papers provide full geostatistical inference and prediction usingnoisy crowdsourced data.In this article, we make several contributions. First, we introduce a Veracity Score (VS)to measure the reliability of the crowdsourced observations on a continuous scale using areference data set. Second, we propose a VS-based methodology to incorporate the veracityassessment into standard spatial analysis so that the effect of noisy and misleading obser-vations is reduced, hence making the estimation and prediction more robust and efficient.Third, we show that using the VS-based technique in hyper-local regions with relativelyhigher number of crowdsourced observations can produce a more accurate and efficient pre-diction surface as compared to the global prediction surface obtained through the analysisof ground-station data alone. This paper is organized as follows. In Section 2, we introducethe veracity score and describe its elementary properties in a relevant geostatistical setting.Section 3 includes a brief description of the standard approach for analyzing geostatisticaldata, followed by a detailed description of the VS-based methodology for estimation andprediction. In Section 4, we describe simulation studies to justify the superiority of VS-based methodology over the standard approach in the analysis of noisy crowdsourced data.In Section 5, we provide details of the analysis, estimation and hyper-local prediction in acase study. Finally, Section 6 summarizes our effort and discusses limitations and possiblefuture works.
2. Defining and Measuring Veracity.
In this section, we provide the intuition andmotivation for veracity scoring. We denote the sample size as n . We denote the volumeof a set A ⊂ R as | A | , i.e., the Lebesgue measure of A if it has nonzero volume and thecardinality of A if A is finite.2.1. Motivation for Veracity Scoring.
To provide motivation for veracity scoring, considera very simple yet practical example.
Example . Let Z , . . . , Z n be independent noisy observations with E ( Z i ) = µ andVar ( Z i ) = σ i for i ∈ { , . . . , n } . The usual sample mean, which is also the o.l.s. estimatorfor µ , is given by ˆ µ ols = ¯ Z n = n − (cid:80) ni =1 Z i , with E (ˆ µ ols ) = µ and Var (ˆ µ ols ) = n (cid:80) ni =1 σ i .If we assume σ i = C · i b , for some constants (w.r.t. n ) C, b >
0, we haveVar (ˆ µ ols ) ≈ C ( b ) · n b , for some constant (w.r.t. n ) C ( b ). Instead of the generic sample mean, consider a weightedaverage of the observations given by ˆ µ = ( (cid:80) ni =1 v i Z i ) / ( (cid:80) ni =1 v i ), where the weights v i = i − a STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA for some constant a >
0, i.e. the weights are inversely proportional to the variance of thenoisy observations. Then, Var (ˆ µ ) ≈ C ( a, b ) · n b − , for some constant C ( a, b ). Clearly, if C, a and b are constants w.r.t. to the sample size n ,then a significant gain in efficiency ( O ( n b − ) as compared to O ( n b )) can be achieved forlarge n by assigning lower weights to high variance observations.If we can find a formulation of the veracity score that is inversely related to the observationnoise variance, we can use it to reduce the effect of the noise in the inference and achievea more accurate and efficient estimator.2.2. Preliminaries.
Let { Z ( s ) , . . . Z ( s n ) } be the varying-quality observations – for ex-ample, the crowdsourced data from cellphone sensors – which are observed at irregu-larly spaced locations S n := { s , . . . , s n } ⊂ R . In addition, at spatial locations T m := { t , . . . , t m } ⊂ R , assume that we have { Y ( t ) , . . . , Y ( t m ) } , which are high-quality, re-liable observations of the spatial process – for example, measurements from the ground-stations. It is common to assume (Cressie 1993, Gelfand et al. 2010) that the spatial randomfield of interest (cid:8) Y ( s ) : s ∈ R (cid:9) can be represented as Y ( s ) = µ ( s ) + (cid:15) ( s ) , (2.1)where µ ( s ) is a deterministic smooth mean function capturing the large scale variation ofthe process, i.e., E ( Y ( s )) = µ ( s ). Here, (cid:15) ( s ) is a mean zero spatially correlated residualprocess which addresses the small-scale variations over the space. For the varying-quality Z -process, we write the decomposition in Equation 2.1 as Z ( s ) = µ ( s ) + w ( s ) , (2.2)where w ( s ) is the aggregated noise associated with the observation Z ( s ). For example, ifwe assume that the varying-quality observations arise from an additive-multiplicative noisemodel as Z ( s i ) = (cid:15) M i Y ( s i ) + (cid:15) A i , (2.3)where (cid:15) M i and (cid:15) A i for i ∈ { , . . . , n } are random variables associated with the multiplicativeand additive noise in the observation Z ( s i ). Then, the associated w -process will have theform w ( s i ) = µ ( s i )( (cid:15) M i −
1) + (cid:15) M i (cid:15) ( s i ) + (cid:15) A i . If there is no multiplicative component (cid:15) M i in the contamination, then w ( s i ) = (cid:15) ( s i ) + (cid:15) A i . In the next subsection, we define a score toassess the quality or reliability of the observation Z ( s i ), namely veracity score. A. CHAKRABORTY ET AL.
Veracity Score: Formulation and Properties.
A good measure of veracity should notonly identify “absurd” observations, but also provide a score for each observation on acontinuous scale, so that the effect of the “bad” observations can be reduced automatically,making inference robust against the low-quality observations. Our goal is to formulate acontinuous scoring procedure to measure the veracity of the observations in two differentscenarios. The first scenario assumes a reference data set containing observations with high-quality but low-density in the concerned regions is available. The second scenario assumesthat we do not have any high-quality reference information available.2.3.1.
Veracity Score with Reference Data.
Consider a hyper-local regional block like thosein Figure 1c or 1d, and denote it by
R ⊂ R . The observation vector with locationsinside R is given as Z := ( Z ( s ) , . . . , Z ( s n )) (cid:48) . Consider R to be the region of interest foranalyzing the varying-quality observations Z . Consider another regional block D such that R ⊂ D ⊂ R and |R| << |D| . Let the reference data vector with locations inside D be denoted as Y := ( Y ( t ) , . . . , Y ( t m )) (cid:48) . The reference data Y is high-quality and hencereliable representation of the spatial process of interest, but it has low data-coverage in thehyper-local region of interest R . So, to get a reasonable sample size for the reference data,we need to consider the larger region D . We denote a δ -neighborhood around a spatialpoint s ∈ R as B δ ( s ), with B δ ( s ) := ( s − δ, s + δ ] for some δ ∈ R + , where the subtractionand addition is component-wise.Define the VS of the observation Z ( s i ) as V ( s i ) = φ (cid:18) | Z ( s i ) − ξ ( s i ) | α + D ( ξ i ) (cid:19) , (2.4)where φ : R + ∪ { } → R + ∪ { } is some non-increasing function such that sup x φ ( x ) < ∞ .We call φ ( · ) the veracity function with α ∈ R + as a regularity parameter. By ξ ( s i ), wedenote a reasonable benchmark for the target process at s i , and ξ i := (cid:16) ξ ( s i ) , . . . , ξ ( s i n ( i ) ) (cid:17) (cid:48) where { s i , . . . , s i n ( i ) } is the set of observation locations in the small δ -neighborhood B δ ( s i ).Finally, D ( x ) denotes a robust measure of dispersion of the observations in the vector x .Clearly, the VS is computed by evaluating the φ -function at the scaled deviation | Z ( s i ) − ξ ( s i ) | α + D ( ξ i ) and due the non-increasing property of φ ( · ), if the deviation is high, we have low VS andif the deviation is low, we have high VS.Now consider the benchmark value, ξ ( s ), for the target at location s . If we have high-quality observations of the Y -process from the reference data at the varying-quality datasites { s , . . . , s n } , then the obvious choice is to take ξ ( s i ) = Y ( s i ). In practice, as we seein Figure 1c to 1f, the locations of the ground-station measurements (reference data) andthe crowdsourced data (varying-quality observations) almost always differ significantly.Hence to define the benchmark at location s i , we propose to compute a kriging surface, STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA (cid:110) s , ˆ Y ( s ) : s ∈ D (cid:111) , of the Y -process using the observation vector Y . Then, we define ξ ( s i )as ξ ( s i ) = ˆ Y ( s i ) + (1 − ν ) C (cid:16) Z i − ˆ Y i (cid:17) , (2.5)where Z i := (cid:16) Z ( s i ) , . . . , Z ( s i n ( i ) ) (cid:17) (cid:48) and ˆ Y i := (cid:16) ˆ Y ( s i ) , . . . , ˆ Y ( s i n ( i ) ) (cid:17) (cid:48) . Here C ( x ) is arobust measure of central tendency of the values in the vector x and ν ∈ [0 ,
1] is a mixingparameter that we discuss in detail later.If we have a reasonable benchmark, ξ ( s i ), for the spatial process of interest at the location s i , the definition of the VS in Equation 2.4 is a transformed measure of the scaled deviationof the observation Z ( s i ) from the benchmark value. In the definition of VS, the measureof dispersion, D ( ξ i ), in the denominator takes the variability in the δ -neighborhood intoaccount. For example, in the analysis of ambient temperature, the variation in a smallneighborhood in the mountains is likely to be higher than an area close to the sea-level.Hence, the statistic | Z ( s i ) − ξ ( s i ) | α + D ( ξ i ) measures the deviation of the observation from its bench-mark relative to the local variability. In the following sections, we use interquartile range(i.e. D ( x ) = IQR( x )) as the robust measure of dispersion in equation 2.4 and the samplemedian (i.e. C ( x ) = Q ( x ), where Q j is the j -th sample quartile) as the robust measureof central tendency in equation 2.5. There are other robust choices as well, but we usethe sample quantile based statistic because it is familiar to the practitioners and easy tointerpret. Also, these choices are theoretically justified as the sample quantiles are asymp-totically consistent under dependence (Ghosh 1971, Sun and Lahiri 2006). The parameter α determines the baseline of the deviation. For lower values of α we penalize more, and forhigher values we allow for a larger deviation from the benchmark. We call α the baselinedeviation of the VS, and its unit is same as the process of interest, which makes the VSunit free.We require the veracity function φ to have the following properties:1. φ ( · ) is a non-increasing function with bounded range, φ ( x ) ≤ φ (0) < ∞ .2. φ ( x ) ↓ x → ∞ .With this formulation, lower values of the VS correspond to the low-quality or less reliableobservations and high values of the VS correspond to the better quality of the observations.We use φ ( x ) = exp ( − x ) for our analysis in the subsequent sections. The advantage of thisfunction is that the VS lies naturally in [0 , ν in the definition of VS. Under the as-sumption that the estimated mean process ˆ µ ( s ) is smooth and the kriged-residual process A. CHAKRABORTY ET AL. ˆ (cid:15) ( s ) is a spatially correlated second-order stationary mean-zero process, for a small enough δ >
0, we can write Q (cid:16) ˆ Y i (cid:17) ≈ ˆ Y ( s i ), as the variation of the kriged process ˆ Y ( s ) insidethe δ -neighborhood is negligible. Hence, we can approximately rewrite the benchmark as ξ ( s i ) ≈ ν ˆ Y ( s i ) + (1 − ν ) Q ( Z i ) . Here, to get a possible approximation the spatial process at location s i , instead of just usingthe estimated value ˆ Y ( s i ) from the high-quality reference data over a bigger surrounding,we want to leverage the available varying-quality observations in the hyper-local region.We propose to use a mixture of an approximation of the spatial process coming from thereference data over a bigger region D , i.e. ˆ Y ( s i ) and a robust local estimate coming fromthe varying-quality observations in the small δ -neighborhood B δ ( s i ) around the locationof interest s i , i.e. Q ( Z i ). Due to the smooth mean and spatially correlated residual pro-cess, the spatial observations in a “small” neighborhood are likely to behave “similarly.”Therefore, it is sensible to use a robust estimate of the central tendency of the varying-quality observations in that small neighborhood as the locally estimated approximation ofthe spatial process at s i . The mixing parameter ν decides the weight of mixing between theestimated process from the reference data and the local approximation from the varying-quality observations. The optimal ν balances the error in estimation from the referencedata and the error in the approximation of the spatial process using the sample median inthe δ -neighborhood.2.3.2. Veracity Score without Reference Data.
We propose a similar definition of the VSwhen we do not have any high-quality reference observations available. In this scenario ourdefinition of VS is V ( s i ) = φ (cid:18) | Z ( s i ) − C ( Z i ) | α + D ( Z i ) (cid:19) . (2.6)The idea behind the definition given in Equation 2.6 is similar to that in Section 2.3.1. Aswe do not have information available from a high-quality reference data set, we use onlythe locally estimated central tendency as the proxy of the target and the local variation inthe denominator to take the regional variability into account. Note that the definition ofthe VS in Equation 2.4 approximately equals the VS as given in Equation 2.6 if we take ν = 0.The formulations of the VS, both with and without reference data, depend on δ , which is apositive scalar equal to half of the length of the neighborhood B δ ( s i ) used to estimate thecenter and dispersion locally. The choice of δ should be such that the δ -neighborhood B δ ( s i )is small as compared to the region of interest R , but at the same time large enough tohave sufficient sample size to provide a good assessment of the quality of the observations.To make the formulation of VS as well-defined, we need the number of points in the δ -neighborhood, n ( i ), larger than 2 for each i ∈ { , , . . . , n } . If we do not have enough data STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA points to compute the measure of dispersion for an observation, we say that the VS isundefined for those observations.Similar approach of comparing the observations with a benchmark value has been used todetect outliers in literature (e.g., see Chapter 1 of Cressie 1993; Papritz 2018a). Lussana,Uboldi and Salvati (2010) proposed a benchmark obtained through leave-one-out cross-validated prediction using the noisy observations. But, as mentioned in Section 1.2, due thepresence of some absurd noise in the training data of the cross-validation, the benchmarksobtained in this technique might themselves be corrupted and hence, are not necessarilyrobust. We prefer quantile based local summaries as benchmarks due to its scalabilityand computational ease, appeal to the practitioners as well as robustness and asymptoticefficiency (see Sen 1968) as compared to some other choices discussed previously.
3. Veracity Score Methods.
Before going to the VS-based version of the spatial anal-ysis, we briefly describe the standard approach of geostatistical analyses.3.1.
Review of Standard Analysis of Spatial Data.
For this section, we use the model spec-ified in Equations 2.1 and 2.2 as well as the notations stated in Section 2.2. In geostatistics,often the smooth deterministic mean process { µ ( · ) } is modeled under a spatial regressionframework where the mean function is assumed to have a linear form, µ ( s ) = x ( s ) (cid:48) β , where x ( · ) = ( x ( · ) , ..., x p ( · )) (cid:48) is a p -dimensional deterministic vector process of known covariatesand β denotes the unknown regression parameter vector. To make the inference feasiblefrom only one replication of the process over the space, some stationarity assumption on thesecond-order structure of the residual process { (cid:15) ( s ) } is required. One of the most commonlyused assumptions is that { (cid:15) ( s ) } is an intrinsically stationary process with an admissibleparametric variogram function 2 γ ( h ; θ ) = Var { (cid:15) ( s ) − (cid:15) ( s + h ) } , where θ is the covarianceparameter of interest.For now, the description of the analysis is given without taking the noisy nature of theobservations into account, so { w ( s ) } is assumed to be identically equal to { (cid:15) ( s ) } . Sincethe covariance parameter is unknown, the standard analysis starts with the estimation ofthe regression parameters in the linear mean model using ordinary least squares (o.l.s.).ˆ β ols = ( X (cid:48) X ) − X (cid:48) Z , where, X := ( x ( s ) , . . . , x ( s n )) (cid:48) . Next, the de-trended observations,i.e. ˆ (cid:15) = Z − X ˆ β ols , are used to estimate the covariance parameter θ using least squares-based variogram model fitting (Cressie, 1993) based on some generic nonparametric semi-variogram estimator (denoted by ˆ γ ( h )) – e.g., the classical or method-of-moments semivar-iogram estimator proposed by Matheron (1962). For example, the weighted least squares(w.l.s.) estimator of θ is given as,ˆ θ wls = argmin θ k (cid:88) j =1 w j { ˆ γ ( h j ) − γ ( h j ; θ ) } , (3.1) A. CHAKRABORTY ET AL. where, w j is the weight corresponding to lag h j and, { h , . . . , h k } are the set of discretelags for which the nonparametric semivariogram ˆ γ ( · ) has been computed. For details ofvariogram model fitting see Cressie (1993), Gelfand et al. (2010). Mat´ern is a popular choicefor the parametric class of admissible variograms as it provides a rich class to choose from(Haskard, 2007). A comprehensive list of parametric variogram models can be found inCressie (1993) and Gneiting (2013).Once the covariance structure is estimated, one can try to improve the mean parameterestimates using estimated generalized least squares (e.g.l.s.) estimator, given by ˆ β egls = (cid:16) X (cid:48) ˆΣ − X (cid:17) − X (cid:48) ˆΣ − Z , where ˆΣ is the estimated variance of (cid:15) = ( (cid:15) ( s ) , . . . , (cid:15) ( s n )) (cid:48) . How-ever, this introduces additional variability due to using the estimated covariance parametersin the mean estimator and is not necessarily more efficient than the o.l.s. estimator.The most commonly used method to predict the process at new locations is to predict the (cid:15) -process at the given locations by the best linear unbiased predictor (BLUP) given theobserved residual vector ˆ (cid:15) , also known as ordinary kriging estimator (Cressie 1993, p. 122).The standard predictor of Y ( s ) isˆ Y std ( s ) = x ( s ) (cid:48) ˆ β ols + ˆ (cid:15) ok ( s ) , (3.2)where ˆ (cid:15) ok ( s ) is the ordinary kriging predictor for (cid:15) ( s ).The standard approach for estimation and prediction explained is not reliable for analyzingnoisy spatial observations, as both the least squares-based mean parameter estimators (Hu-ber and Ronchetti 2009) and the method-of-moments empirical semivariogram estimatorare highly sensitive to the noise (Cressie and Douglas 1980) in the data. In the followingsections, we propose a way to incorporate the VS into the analysis to make the inferenceand prediction robust against the noise in the data.3.2. Veracity score-based estimation of the mean function.
In the standard approach, asdescribed in Section 3.1, the regression parameter vector β is estimated using the o.l.s.method. For our approach, instead of simple squared error loss, motivated by Ex. 2.1, wepropose to minimize a weighted version of the loss function with the veracity scores as thecorresponding weights. The VS-based estimator of the mean parameter β is given asˆ β vs = argmin β n (cid:88) i =1 V ( s i ) L (cid:0) Z ( s i ) , x ( s i ) (cid:48) β (cid:1) . (3.3)For least squares-based estimators, we have L ( y, u ) = ( y − u ) , the squared-error loss func-tion. The locally estimated veracity scores lessen the effects of “absurd” observations inthe objective function and thus make the estimation of the mean function less sensitive tothe noise. The VS-based approach is adaptive to the quality of the observations and thus STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA lessens the impact of outliers in the data. To make the estimation more robust to contam-ination, one can use any robust loss function instead of squared-error loss in Equation 3.3.We have used an MM-type estimator with a linear quadratic quadratic ψ -function for therobust regression as discussed in Koller and Stahel (2011). The advantage of using this es-timator is that in addition to penalizing less for high residuals, the parameters associatedwith the ψ -function can be tuned to improve the asymptotic efficiency for the estimators.The corresponding optimization to solve Eequation 3.3 can be executed using IterativeRe-weighted Least Squares (IRLS) as discussed in Todorov and Filzmoser (2009).The assessment of goodness of fit for the estimated linear model is essential. The usualMultiple R is not reasonable to use, as the loss function is different from ordinary leastsquares. Inspired by the pseudo-R coined by Willet and Singer (1988), we proposeanother variant of the coefficient of determination for VS-based regression asR = 1 − (cid:80) ni =1 V ( s i ) L (cid:16) Z ( s i ) , x ( s i ) (cid:48) ˆ β vs (cid:17)(cid:80) ni =1 V ( s i ) L (cid:0) Z ( s i ) , ¯ Z (cid:1) , where ¯ Z = n − (cid:80) i Z ( s i ). The idea behind this measure is that instead of using the squarederror loss to compute the total sum of squares and the residual sum of squares, theproposed R uses the robust loss function to measure the total variability in the data(i.e. (cid:80) ni =1 V ( s i ) L (cid:0) Z ( s i ) , ¯ Z (cid:1) ) and the variability that is not explained by the model (i.e., (cid:80) ni =1 V ( s i ) L (cid:16) Z ( s i ) , x ( s i ) (cid:48) ˆ β vs (cid:17) ). Although we do not provide any theoretical justification,it appears from explanatory analysis with synthetic data and simulations that R mayprovide an overly optimistic assessment of the goodness of the fit for the model when theHuber’s loss function or MM-type estimation is used.3.3. Veracity score-based estimation of the covariance structure.
To explore the second-order structure of the spatial process, we analyze the residuals obtained by de-trending theobservations, ˆ (cid:15) vs ( s i ) = Z ( s i ) − x ( s i ) (cid:48) ˆ β vs for i ∈ { , , ..., n } . When conducting analysis withvarying-quality geostatistical data, after the robust estimation of the regression parameters,a portion of the residuals are affected by the presence of measurement error in the data,and direct analysis of these residuals can result in misleading and inefficient estimation ofthe covariance structure. To reduce the noise of the observed residuals, we propose a VS-based modification of residuals using a local smoothing prior estimation of the covarianceparameters. When we have a high-quality reference data, we define the VS-based smoothedversion of the residuals as˜ (cid:15) ( s i ) = V ( s i ) q ˆ (cid:15) vs ( s i ) + (1 − V ( s i ) q ) Q ( ξ i − X i ˆ β vs ) , (3.4)where X i := (cid:16) x ( s i ) , ..., x ( s i n ( i ) ) (cid:17) (cid:48) is the n ( i ) × p matrix of the covariates corresponding tothe observations in B δ ( s i ). Here, q is the parameter regulating the degree of the smoothing A. CHAKRABORTY ET AL. needed. For instance, q = 0 implies no smoothing, and q = 1 implies the convex combinationof the locally-corrected residual and the observed residual. As shown in Figure S4-(a) in thesupplementary material, the parameter q here plays the role of thresholding – for higher q ,only observed residuals with high VS get significant weights for the VS-based smoothing.Whereas, for smaller q the formulation of the smoothed residuals in Equation 3.4 putssignificant weights to even the observed residuals with low VS and thus, reducing thedegree of smoothing.If we do not have reference data available, then the analogous smoothed version of theresiduals is given by ˜ (cid:15) ( s i ) = V ( s i ) q ˆ (cid:15) vs ( s i ) + (1 − V ( s i ) q ) Q (ˆ (cid:15) i ) , (3.5)where ˆ (cid:15) i = (cid:16) ˆ (cid:15) vs ( s i ) , . . . , ˆ (cid:15) vs ( s i n ( i ) ) (cid:17) (cid:48) . Again note that the definition in Equation 3.4 ap-proximately simplifies to the one in Equation 3.5 if ν = 0.For poor quality observations, when V ( s i ) is small, the effect of the observed value ofthe residual ˆ (cid:15) vs ( s i ) is scaled down by V ( s i ) q (as V ( s i ) ∈ (0 , − V ( s i ) q ) in Equations 3.4 and 3.5. The effect of VS-based smoothing is illustrated ona synthetic data set in Section B.2 and Figure S.3 in the supplementary material.We propose to use variogram model fitting with the VS-based smoothed version of theresiduals, { ˜ (cid:15) ( s i ) } ni =1 , to estimate the covariance parameter θ robustly. First a generic non-parametric semivariogram is evaluated at discrete lags using the robust semivariogramestimator proposed by Cressie and Douglas (1980):ˆ γ vs ( h u ) = (cid:110) | N ( H u ) | (cid:80) ( s i , s j ) ∈ N ( H u ) | ˜ (cid:15) ( s i ) − ˜ (cid:15) ( s j ) | (cid:111) .
457 + . | N ( H u ) | , for u ∈ { , . . . , K } , (3.6)where N ( H u ) = { h ∈ H : h ∈ H u } . H u are small lag classes or bins (see p. 34, Gelfandet al. 2010), which are often called tolerance regions (see p. 70, Cressie 1993), and theseconstruct a partition of size K of the lag-space H = { s − s (cid:48) : s , s (cid:48) ∈ R} . The candidate lagfor the tolerance region H u is denoted by h u , which is often taken to be the mean of theobserved lags in the bin or the centroid of the the class H u .The parameters are estimated using method of weighted least squares asˆ θ vs = argmin θ Q wls ( θ )= argmin θ K (cid:88) u =1 | N ( h u ) |{ γ ( h u ; θ ) } { ˆ γ vs ( h u ) − γ ( h u ; θ ) } , (3.7) STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA where γ ( · ; θ ) is some pre-specified parametric admissible semivariogram model, as discussedin Section 3.1. Other robust empirical variogram estimators (for example Genton 1998, Lark2000) can also be used instead of the one proposed by Cressie and Douglas (1980), as givenin Equation 3.6. Genton (1998) showed that the robustness properties of the empiricalsemivariogram proposed by Cressie and Douglas (1980) are not enough in the presence of“absurd” outliers in the data. But, due to the VS-based smoothing in the first stage ofthe covariance estimation, the very large measurement errors have already been addressedand hence, using Cressie and Douglas (1980)’s version of robust variogram estimator isreasonable here.3.4. Veracity score-based spatial prediction.
Often the aim for spatial analysis of geostatis-tical data is to predict the process at locations of interest or to create a prediction surfaceover a region of interest. To predict the (cid:15) -process at a new location s , we can use ordinarykriging with the VS-based smoothed residuals ˜ (cid:15) = (˜ (cid:15) ( s ) , . . . , ˜ (cid:15) ( s n )) (cid:48) as˜ (cid:15) ( s ) = (cid:40) γ + (cid:0) − (cid:48) Γ − γ (cid:1) (cid:48) Γ − (cid:41) (cid:48) Γ − ˜ (cid:15) , (3.8)where γ = (cid:16) γ ( s − s ; ˆ θ vs ) , ..., γ ( s − s n ; ˆ θ vs ) (cid:17) (cid:48) and (Γ) ij = γ ( s i − s j ; ˆ θ vs ) (see chapter 3,Cressie 1993). The residual kriging variance, which quantifies the prediction uncertainty,can be estimated as ˆVar (˜ (cid:15) ( s )) = ˆ σ ( s ) = γ (cid:48) Γ − γ − (cid:0) (cid:48) Γ − γ (cid:1) (cid:48) Γ − . Finally, we predict the process at s using the modified version of Equation 3.2 as,ˆ Y vs ( s ) = x ( s ) (cid:48) ˆ β vs + ˜ (cid:15) ( s ) . (3.9)In Equation 3.9, both the mean and covariance parameters have been robustly estimatedusing the VS-based procedures. The smoothing parameter q for the VS-based smoothingof the residuals can be chosen using cross-validation.There are other robust kriging approaches available in literature, for example, K¨unsch et al.(2011) and Papritz (2018b). Both of these techniques require distributional assumption onthe (cid:15) -process. Moreover, it is not straightforward to determine how to reduce the effects ofobservations that are not noisy but represent some other spatial process. For example, ifin a local region most of the crowdsourced ambient temperatures are captured in indoorsettings, applying the robust procedures directly may lead to misleading estimation ofthe model parameters and hence bad prediction of the outdoor ambient temperature. Onthe other hand, the VS-based technique can use a benchmark value, possibly obtainedfrom a high-quality but low-density reference data, to reduce the effects of the ‘misleading’ A. CHAKRABORTY ET AL. observations and thus estimate and predict the process of interest efficiently. Theoretical ornumerical comparison of other robust kriging methodologies with the VS-based techniquein case of no available reference data is beyond the scope of this article.
4. Simulation Study.
Our simulation study aims to justify the superiority of the VS-based estimation and prediction methods as compared to the standard approach for an-alyzing noisy geostatistical data. We have considered two scenarios here: the first one iswhen no reference data is available and the second one is when a coarser but better qualityreference data is present.4.1.
Without Reference Data.
We take the sampling region for the varying-quality ob-servations to be
R ≡ R n := [0 , λ n ] , where { λ n } n is a sequence of positive real numbersdetermining the size of the sampling region. We have assumed that the varying-qualityobservations { Z ( s ) , . . . , Z ( s n ) } are coming from an additive-multiplicative noise model asgiven in Equation 2.3. To generate the “true” process for simulation purposes, we use thefollowing spatial linear model: Y ( s i ) = β + ( β x , β y ) (cid:48) s i + β h h ( s i ) + (cid:15) ( s i ) , (4.1)where β := ( β , β x , β y , β h ) (cid:48) is the vector of regression parameters; h ( s ) is the altitude ofthe location s ; and { (cid:15) ( s ) } is a second-order stationary spatially correlated process.To define the altitude function over the sampling region, we use the deterministic function h ( s ) = H · (cid:80) H j =1 w h ( j ) f ( s ; µ j , Σ j ) + H , where f ( · ; µ , Σ) denotes the bivariate normaldensity with mean µ and covariance matrix Σ and (cid:8)(cid:0) µ j , Σ j (cid:1) : j ∈ { , . . . , H } (cid:9) are fixedset of vectors and matrices. The residual vector ( (cid:15) ( s ) , . . . , (cid:15) ( s n )) (cid:48) are sampled from asecond-order stationary mean-zero Gaussian process with isotropic Mat´ern covariance givenby C ( d ; θ ) = σ (cid:15) − κ Γ( κ ) (cid:32) √ κ dρ (cid:33) κ K κ (cid:32) √ κ dρ (cid:33) + τ ( d = 0) , (4.2)where Γ is the gamma function, K κ is the modified Bessel function of the second kindwith order κ (Abramowitz and Stegun 1972). The covariance parameter vector of interestis θ = (cid:0) τ , σ (cid:15) , ρ, κ (cid:1) (cid:48) , where τ is the nugget effect, σ (cid:15) , ρ, κ are the partial sill, range andsmoothness parameters respectively (Haskard 2007, Gelfand et al. 2010).To generate noise for the varying-quality observations, we use the following model forthe additive and multiplicative components, denoted by (cid:15) A := ( (cid:15) A , . . . , (cid:15) A n ) (cid:48) and (cid:15) M :=( (cid:15) M , . . . , (cid:15) M n ) (cid:48) respectively: (cid:15) M i ∼ (cid:40) ∆(1) if i ∈ G n × Beta( α M , α M ) o.w. ; (cid:15) A i ∼ (cid:40) ∆(0) if i ∈ G n N (0 , σ A ) o.w. , (4.3) STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA where, ∆( x ) denotes a degenerate distribution with point mass at −∞ < x < ∞ ; variancecorresponding to the multiplicative component σ M = α M +1 ; G n ⊂ { , . . . , n } is a subsetof indices and σ M , σ A are positive constants. With this model, if i ∈ G n , we have no noiseassociated with the observation, i.e., Z ( s i ) = Y ( s i ). If i / ∈ G n , then Z ( s i ) = (cid:15) M i Y ( s i ) + (cid:15) A i ,where (cid:15) M i and (cid:15) A i have positive variance. Also, we have taken { (cid:15) M i } ni =1 , { (cid:15) A i } ni =1 and { (cid:15) ( s i ) } ni =1 are independent of each other. We further assume that the proportion of “good”observations is a constant (w.r.t. n ) denote by q e , i.e., | G n | /n ≈ q e , and 1 − q e is theproportion of noisy observations in the data. This model is inspired by the crowdsourceddata analysis scenario where only a proportion of observations are “bad”. The choice ofmultiplicative error distribution in Equation 4.3 restricts its realizations to be in [0 ,
2] andalso ensures that the multiplicative errors are symmetric around 1.We set β = (55 , . , − , − . (cid:48) , θ = (0 , , . , (cid:48) . To investigate the robustness of theVS with increasing noise in the data, we consider three contamination models specified bythe following parameters: (a) σ A = 5 , α M = 2 , q e = 0 .
95, (b) σ A = 50 , α M = . , q e = 0 . σ A = 100 , α M = 0 . , q e = 0 .
8. As we go from model (a) to (c), the noise in thedata increases both in extent and magnitude. For example, with model (a), the varianceof a noisy observation at location s is 0 . x ( s ) (cid:48) β ) + 28 .
6, and the proportion of suchobservations is 5%; with model (c), the same variance will be 0 .
91 ( x ( s ) (cid:48) β ) + 10005 . α and the smoothing parameter q are discussed in Section C.1in the supplementary material.In Figure 3 we show boxplots of the VS-based estimator ˆ β vs and the standard estimator ˆ β ols for the four regression parameters based on B = 200 simulations with n = 500 samples. TheVS-based technique shows more robustness towards the added noise in the observations.As we move from noise model (a) to (c), the efficiency of the o.l.s. estimator is heavilycompromised, where as the spread of the VS-based estimates is hardly increased. SectionC.2 in the supplementary material contains additional simulation results for regressionparameter estimation: boxplots if the estimates for n = 100 , σ (cid:15) + τ ,the total variance the residual process) obtained by the VS-based methodology is moreaccurate by large margins as compared to standard variogram estimation. As the samplesize increases both the bias and standard deviations of the VS-based estimators are closingtowards 0 under all the considered noise models. Table 1 clearly establishes the efficiency A. CHAKRABORTY ET AL. lllll ll lll l noise.model e s t i m a t e s Truth method
VSStd
Boxplot of VS−based and Std. estimates: beta_0 llll l ll lllll ll −10123 (a) (b) (c) noise.model e s t i m a t e s Truth method
VSStd
Boxplot of VS−based and Std. estimates: beta_x ll llllll l l ll −3−2−101 (a) (b) (c) noise.model e s t i m a t e s Truth method
VSStd
Boxplot of VS−based and Std. estimates: beta_y ll lll lll lllll ll −0.10−0.05 (a) (b) (c) noise.model e s t i m a t e s Truth method
VSStd
Boxplot of VS−based and Std. estimates: beta_h
Fig 3: Performance of the VS-based and standard regression parameter estimators foranalyzing varying-quality observations (sample size n = 500) without reference data.of VS-based covariance estimation as compared to the standard approach when some ofthe observations are corrupted. For a fixed n , if we move from noise model ( a ) to noisemodel ( c ) the increase in bias and standard errors of the VS-based sill parameter estimatoris prominent, though the magnitude of increment is much smaller as compared to thestandard method of estimation.Next we evaluate the VS-based spatial prediction using a 4 (cid:100) λ n (cid:101) × (cid:100) λ n (cid:101) grid over thesampling region R as shown in Figure 4a. We make predictions at these grid points usingboth the VS-based and standard approach and evaluate the predictions and kriging by thefollowing metrics:RMSPE = (cid:115) n (cid:88) s ∗ (cid:16) ˆ Y vs ( s ∗ ) − Y ( s ∗ ) (cid:17) ; ResRMSPE = (cid:115) n (cid:88) s ∗ (˜ (cid:15) ( s ∗ ) − (cid:15) ( s ∗ )) , where the sum (cid:80) s ∗ is over the grid points. We define the performance metrics for the stan-dard methods analogously. The Root-Mean-Squared-Prediction-Error (RMSPE) measuresthe average prediction error over the selected grid; and the Residual-Root-Mean-Squared-Prediction-Error (ResRMSPE) evaluates the accuracy and efficiency of the kriging on theselected grid for the spatially correlated residual process: { (cid:15) ( s ) } . By Av.RMSPE we denote B (cid:80) b RMSPE( b ) where RMSPE( b ) is the prediction error in the b -th simulation iteration.We define Av.ResRMSPE similarly.Table 2 summarizes the results which show that the VS-based predictions are much better STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA Noise Model n bias.sill.VS bias.sill.Std bias.range.VS bias.range.Std (a) 100 -0.313 (3.31) 3837.513 (9867.28) -0.296 (0.13) 6.671 (16.1)500 0.23 (1.16) 623.629 (1644.56) -0.114 (0.06) 3.778 (9.91)3000 0.344 (0.62) 36.098 (82.01) -0.026 (0.05) 0.307 (3.2)(b) 100 7.657 (8.13) 17545.465 (58680) -0.357 (0.08) 69.945 (454.78)500 1.747 (1.52) 5135.181 (14207.51) -0.158 (0.06) 3.711 (11.05)3000 0.48 (0.96) 1108.544 (3515.95) -0.06 (0.05) 8.377 (52.63)(c) 100 32.774 (9.51) 6606.713 (27599.4) -0.39 (0.03) 130.833 (463.05)500 15.352 (6.23) 21915.533 (63507.44) -0.241 (0.05) 6.222 (47.09)3000 2.933 (1.14) 5289.832 (12192.3) -0.111 (0.04) 4.624 (19.92)
Table 1:
Performance of the VS-based methodology and standard approach in estimating covarianceparameters on varying-quality observations. l l ll lll ll lll l l lll l ll lll ll lllll ll l l l lll ll ll lll l ll ll l ll ll ll l ll ll lll lll ll ll llll ll lll lll llll ll ll ll ll lll l ll ll ll ll ll lll l l ll ll ll lll ll l lll ll l l llll lll lll l l ll lll ll ll l lll llll l l llll lll lll l ll l l lll l ll lllll l l ll ll l lll l ll ll ll llll ll lll l l ll ll ll ll l l l ll ll lll ll llll l l llll lll ll l l l llll l ll l lll l lll lll ll ll ll ll lll ll lll l lll ll ll ll ll ll l l ll lll l ll ll lll l lll ll ll lllll ll ll l l llll ll ll lll l ll ll l lll lll l lll l ll l l l ll ll lll l lll lll l ll l lll lll ll l ll lll ll l ll ll llll ll ll ll ll l lll lllll l lll lll ll llll l ll ll ll lll ll lll llll ll l ll l llll ll ll ll lll l ll ll ll lll ll l ll ll l l ll lll ll lll l l lll l ll lll ll lllll ll l l l lll ll ll lll l ll ll l ll ll ll l ll ll lll lll ll ll llll ll lll lll llll ll ll ll ll lll l ll ll ll ll ll lll l l ll ll ll lll ll l lll ll l l llll l ll lll l l ll lll ll ll l lll llll l l llll lll lll l ll l l lll l ll lllll l l ll ll l lll l ll ll ll llll ll lll l l ll ll ll ll l l l ll ll lll ll llll l l llll lll ll l l l llll l ll l lll l lll lll ll ll ll ll lll ll lll l lll ll ll ll ll ll l l ll lll l ll ll lll l lll ll ll lllll ll ll l l llll ll ll lll l ll ll l lll lll l lll l ll l l l ll ll lll l lll lll l ll l lll lll ll l ll lll ll l ll ll llll ll ll ll ll l lll lllll l lll l ll ll llll l ll ll ll lll ll lll llll ll l ll l llll ll ll ll lll l ll ll ll lll ll l ll ll lev l l CSgrid04080120
Z(s)
Varying−quality observations (a) Varying-quality observations (CS =crowdsourced) and the grid to validate pre-diction. l llllllll lllll ll llll lll lll lllll llllll lll ll l llll llll lll l lll lll ll llllll ll ll ll ll lll lll l lll l llll ll ll ll ll l l lll llll lllll ll llll lll lll l llll llllll lll ll l llll llll lll l lll lll ll lll lll ll ll ll ll lll lll l lll l lll l ll ll ll ll
Z(s)lev l l CSGS
Varying−quality and reference observations (b) Varying-quality hyper-local observations(CS) with reference data (GS = ground-station).
Fig 4: Example sampling points for the simulations.than the standard analysis in almost all the cases. As we go from model (a) to model (c)the prediction accuracy has compromised for both the VS-based as well as the standardapproach with much higher impact for the later one. However, in terms of residual krigingefficiency the VS-based methodology is highly robust as compared to the ordinary krigingusing the residuals obtained from o.l.s.4.2.
With Reference Data.
In this subsection, we consider a situation that is more similarto our case study. In addition to the n varying-quality observations in the hyper-local region R = [0 , λ n ] , we have m -many high-quality observations available over a larger region D = [0 , Λ m ] . One example of the sampling points is shown in Figure 4b. Our goal is topredict the process within the hyper-local region R using the varying-quality observations.We again use a 4 (cid:100) λ n (cid:101) × (cid:100) λ n (cid:101) grid over the hyper-local region of interest R to evaluate A. CHAKRABORTY ET AL.
VS Std. App.Noise Model n Av.RMSPE Av.ResRMSPE Av.RMSPE Av.ResRMSPE (a) 100 5.29 (4.04) 0.703 (0.22) 8.61 (15.37) 3.637 (1.82)500 4.046 (1.03) 0.281 (0.03) 4.826 (8.89) 4.416 (1.33)3000 3.927 (1.07) 0.141 (0.02) 3.228 (1.37) 5.306 (0.48)(b) 100 9.67 (6.11) 1.796 (0.77) 37.38 (92.72) 14.717 (6.99)500 8.478 (5.04) 0.358 (0.07) 28.911 (75.28) 14.267 (7.56)3000 5.196 (3) 0.15 (0.02) 20.546 (33.01) 14.902 (8.07)(c) 100 21.071 (11.29) 5.833 (1.39) 98.585 (206.89) 38.74 (19.03)500 26.325 (14.35) 1.376 (1.44) 66.6 (152.04) 36.354 (20.06)3000 13.722 (6.55) 0.23 (0.04) 94.429 (193.5) 31.606 (23.25)
Table 2:
Prediction performance of the VS-based methodology and standard approach on varying-quality observations without any reference data. the predictions. In addition to the predictions obtained by the VS-based and standardmethodology on the varying-quality observations, we also consider the global predictionsobtained by using only the reference data on the larger region as shown in Figure 4b. Forthis simulations we have considered the sample sizes for varying-quality observations to beequal to 50 ,
100 and 500 because the hyper-local regions in our case studies do not containvery ‘large’ (not more than 300) number of crowdsourced observations. For the referencedata the sample sizes we have taken m = 100.In Table 3, first we compare the performances of the VS-based and standard predictionsusing hyper-local noisy data based on RMSPE for both at the response level (Av.RMSPE)and residual level (Av.ResRMSPE). Clearly, we can see that VS-based predictions are VS Std. App. Ref. OnlyNoise Model n Av.RMSPE Av.ResRMSPE Av.RMSPE Av.ResRMSPE Av.RMSPE Av.ResRMSPE (a) 50 12.26 (12.71) 7.084 (5.08) 1740.696 (9518.03) 1745.244 (9604.81) 9.711 (8.54) 9.017 (7.46)100 10.877 (11.61) 6.104 (5.24) 230.117 (918.91) 224.56 (934.89)500 8.787 (8.05) 6.287 (6.47) 358.694 (1976.29) 352.86 (1975.49)(b) 50 12.933 (13.1) 8.206 (6.76) 52829.372 (662485.91) 52946.917 (664727.7)100 9.907 (10.81) 6.439 (5) 115.222 (923.6) 387.071 (915.98)500 9.005 (8.66) 6.72 (5.06) 26.31 (19.18) 217.784 (15.88)(c) 50 12.33 (18.51) 8.7 (16.61) 10198.908 (85831.41) 9740.72 (85082.65)100 10.131 (10.93) 7.093 (5.04) 155.796 (126.08) 412.788 (31.68)500 9.786 (8.45) 6.402 (5.24) 239.728 (29.49) 27.335 (8.35)
Table 3:
Performance of hyper-local predictions using the VS-based methodology, the standardapproach and global predictions using reference data only. For these simulations we used referencedata with sample size m = 100. STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA uniformly better than the standard ones in all the considered cases. Next, we comparethe VS-based predictions using hyper-local noisy data and the predictions obtained byimplementing the standard methodology on the high-quality reference data over a biggerregion. We refer the later one as ‘Ref. Only’. From Table 3, we see that, at response level(i.e. comparing Av.RMSPE), under all noise models, the performance of the VS-basedpredictor using varying-quality observations is similar or slightly worse to the ‘Ref. Only’predictor, when the number of hyper-local noisy data and the high-quality reference dataare comparable (i.e. the case when both n and m = 100.) In case we have larger samplesize ( n = 500) in the hyper-local regions, we see a little gain in prediction efficiency interms of Av.RMSPE. However, if we consider the residual kriging performance i.e. theResRMSPE, the VS-based technique has outperformed the ‘Ref Only’ kriging for all thecases, even when we have only n = 50 many varying-quality observations. As the kriging ismore efficient when we have observations closer to the locations of our interest, the varying-quality hyper-local observations along with the robust VS-based methodology improves theefficiency of the spatial prediction as compared to the corresponding ‘Ref. Only’ version.Additional details regarding the simulation results, e.g. the parameters of the models andchoices of the regularity parameters etc., are reported in Section C.1 of the supplementarymaterial.
5. Case Study: Spatial Analysis of WeatherSignal Data.
In this section, we an-alyze the WeatherSignal data described in Section 1.1 using the VS-based methodology(Section 3). Our goal for this noisy crowdsourced data set is to perform structure explo-ration and then prediction of the daily average ambient temperature process in hyper-localregions of interest.5.1.
Building Hyper-Local Prediction Surfaces.
Here we describe the VS-based analysis ofthe crowdsourced WeatherSignal data using the NOAA ground-station data as reference.We first select a hyper-local region, as denoted by R in Section 2.3.1, around Los Angeles,CA, as shown in Figure 5d. The analysis starts by defining a region large enough to havesufficient NOAA ground-station observations to build a reasonable global prediction surfacearound the region of interest. In Figure 5b, we plot the m = 310 ground-station observationsin California. Using the standard approach on the NOAA ground-station data, as describedin Section 3.1, we build a prediction surface for California and plot it in Figure 5c. Themodel we use to estimate the mean is given by µ ( s ) = β + β x s x + β y s y + β xy s x s y + β h h ( s ) , (5.1)where s := ( s x , s y ) (cid:48) and h ( s ) denotes the elevation of the point s . The mean model ex-plains 79% (adjusted R ) of the variability in the ground-station ambient temperatures inCalifornia. A. CHAKRABORTY ET AL. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllllllllllllllllll
Temp(deg F)
Crowdsourced Data: West Coast (a)
Temp(deg F)
NOAA Gd.St. Data: California (b) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Temp(deg F)
Pred. Surface: NOAA data only (c) ll l llllllll llllllllllllllllllllllllllllllll l lllllll lllllllllll llllllllllllllllll ll l llllllll lllllllllllllllllllllllllllllll l l lllllll lllllll llll lllll lllllllllllll
Temp(deg F)
LA Crowdsourced Data (d)
Temp(deg F)
LA NOAA Data (e)
Fig 5: (a) Crowdsourced observations in CA; (b) Available ground-station observations; (c) Predic-tion surface using the standard approach on the ground-station data; (d) Crowdsourced observationsin a hyper-local region around Los Angeles; (e) ground-station observations in a hyper-local regionaround Los Angeles.
We then fit a Mat´ern covariance to the observed residuals from the mean model estimation.Details of the variogram estimation are given in Table 4 and Figure 6. We then use standardkriging methodology with the estimated mean and covariance model to create the predictionsurface (cid:110) ( s , ˆ Y ( s )) : s ∈ D (cid:111) , as shown in Figure 5c. Parameters Estiamtes partial sill ( σ ) 13.78range ( ρ ) 0.36nugget ( τ ) 7.95smoothness ( κ ) 2.45 Table 4:
Estimated Mat´ern parameters. l l l l l l l l l l l l l l l
Variogram Fitting, California Noaa Data distance s e m i v a r i an c e Fig 6:
Variogram estimation
As we can see in Figure 5a, the spatial coverage of the crowdsourced data does not supporta global prediction surface over California or even the coast of California. However, ifwe consider the 25 ×
25 mile region ( R ) in LA, as shown in Figure 5d, the density ofcrowdsourced data is much higher as compared to only one ground-station observation STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA . . . . . . Mixing function
R−sq of NOAA fit M i x i ng pa r a m e t e r n_i = 100n_i = 50n_i = 10n_i = 5 (a) VS c oun t Histogram of VS, in LA (b)
Fig 7:
Mixing function (a) and the histogram of the veracity scores (b) for the crowdsourcedobservations in Los Angeles. (Figure 5e). While there is only one ground-station available at Los Angeles InternationalAirport, the number of crowdsourced observations, { Z ( s ) , . . . , Z ( s n ) } , in R is n = 80.The next part of the analysis examines whether we can leverage the additional crowdsourcedinformation through the VS-based methodology. We want to explore whether we can createa more reasonable and efficient prediction surface (cid:110) ( s , ˆ Y vs ( s )) : s ∈ R (cid:111) over the region R in Los Angeles as compared to the surface obtained from the analysis of the ground-stationdata only, (cid:110) ( s , ˆ Y ( s )) : s ∈ R (cid:111) .The VS-based analysis starts by computing the veracity score of the crowdsourced ob-servations using the definition in Equation 2.4. We set the baseline deviation α = 3. Inan ideal scenario, when the corresponding δ -neighborhood has very little variation andIQR ( ξ i ) ≈
0, an observation with 3 ◦ F deviation from the corresponding benchmark valuehas a VS approximately equal to exp( − ≈ . ◦ F devi-ation has a VS ≈ . δ = 0 .
08 in the units of latitude and longitude. To choose a suitable mixing parameter ν ,we use the function ν ( s i ) = 1 − exp (cid:32) − − R ) (cid:112) n ( i ) (cid:33) , where R is the adjusted R-squared for the estimation of the mean surface using NOAAground-station data only and n ( i ) is the number of crowdsourced data in the δ -neighborhood.As Figure 7a shows, this function is increasing in R and decreasing in n ( i ). ν ( s i ) = 1 ifR = 1 and ν ( s i ) = 0 if n ( i ) = ∞ . With this formulation, the mixing parameter takes boththe goodness of fit for the ground-station data and the number of crowdsourced observa-tions used for local approximation of the target value into account. Using the specifiedparameters, we compute the VS for the crowdsourced observations in R and plot theirempirical distribution in Figure 7b. A. CHAKRABORTY ET AL. VS c oun t Hist: Obs. Resid., in LA (a) VS c oun t Hist: VS−Smoothed Resid., in LA (b)
Variogram Fitting, LA Crowdsourced Data distance s e m i v a r i an c e (c) Fig 8:
Histograms of the observed residuals (a) and VS-based smoothed residuals (b) and theVS-based variogram fitting (c) for optimal q = 0 . We next estimate the mean and covariance of the process. For robust estimation of themean function, we use the weighted MM-type estimator, as discussed in Section 3.2 withthe VS of the observations as the corresponding weights. Once the regression parametersare estimated, for a given smoothing parameter q in Equation 3.4, we use the VS-basedsmoothing technique to reduce the effects of noise in the residual process as discussedin Section 3.3. Using the smoothed residuals, we estimate the covariance parameters anduse the estimates to create a prediction surface using VS-based kriging as discussed inSection 3.4.To make an optimal choice for q , we use the reference data. For a pre-specified set of valuesof q ∈ [0 . ,
3] the covariance estimation and kriging are executed at the ground-stationlocations that are inside the hyper-local region R , and the q that minimizes the meansquared error of prediction at the stations is chosen to be optimal. In the analysis for thehyper-local region around Los Angeles, there is only one station available, so we use the setof points with VS greater than or equal to 0 . n − ∗ (cid:80) j (cid:16) Z ( s j ) − ˆ Y ( − j )vs ( s j ) (cid:17) where ˆ Y ( − j )vs ( s j )is the predicted value at s j obtained using { Z ( s ) , . . . , Z ( s j − ) , Z ( s j +1 ) , . . . Z ( s n ) } as thetraining data and the sum is over the test data set whose cardinality is denoted by n ∗ .In Figures 8a and 8b, we plot the histograms of the observed residuals from the VS-basedrobust regression and the residuals after the VS-based smoothing. The VS-based smoothingclearly reduces the spread of the residual values by smoothing out the large errors. InFigure 8c, we show the robust variogram fitting of the VS-based smoothed residuals forthe optimal choice of the smoothing parameter q = 0 . R using Equa-tion 3.2. In Figure 9, we plot the hyper-local prediction surfaces obtained by the standardanalysis with the NOAA ground-station data only as well as the one obtained by imple-menting the VS-based technique on the crowdsourced observations with the ground-station STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Temp(deg F)
Pred. Surface w NOAA data only: LA (a) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Temp(deg F)
VS−based Pred. Surface: LA (b) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Krig. var.
Kriging Var w NOAA data only: LA (c) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Krig. var.
VS−based Kriging Var: LA (d) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll % Inc. in M.E. % Increase in VS−based Mar. of Err.: LA (e)
Fig 9: (a) Hyper-local version of the same surface as in Figure 5c; (b) Prediction surface obtainedby the VS-based technique on the crowdsourced data in Los Angeles; (c) Residual kriging variancefor the predictions using NOAA data only (d) Residual kriging variance for the predictions usingthe VS-based predictions with crowdsourced data; (e) the % increase in the margin of error for theVS-based predictions as compared to the predictions with NOAA data data as the reference. Clearly the prediction surface obtained from standard analysis of theground-station data (Figure 9a) is too smooth to capture the local variability accurately.The prediction surface obtained by the VS-based analysis on crowdsourced data showsmore variation across the space. To highlight the advantage of having crowdsourced ob-servations, we compare the residual kriging variance surfaces in Figures 9c and 9d. It isprominent from Figure 9d that, the VS-based kriging variance is much smaller as comparedto the global kriging using only the ground-station data, especially at locations that areclose to the crowdsourced observations.In addition, we illustrate the gain in efficiency by plotting the percentage increase in marginof error (at 95% confidence) for the VS-based predictions from the hyper-local crowdsourcedinformation as compared to the global prediction using ground-station data only, i.e., 100 × (cid:16) M.E.( ˆ Y vs ( s )) − M.E.( ˆ Y ( s )) (cid:17) / (cid:16) M.E.( ˆ Y ( s )) (cid:17) , where M.E. denotes the ‘margin of error’(half of the length of the prediction interval) to predict the target response Y ( s ). Tocompute the margin of error, we use ad hoc confidence intervals for the residual krigingpredictor with ± .
96 as the corresponding quantiles and then add the margin of error of themean (1 . × s.e.( x ( s ) (cid:48) ˆ β vs )) and the margin of error of the residual kriging predictor (1 . × (cid:112) Krig.Var.(˜ (cid:15) ( s ))). The margin of error for the standard predictor is computed similarly. Amore theoretically justifiable interval can be obtained through spatial re-sampling techniqueas discussed in Lahiri (2003), but that requires further research and is beyond the scope of A. CHAKRABORTY ET AL. this study. In Figure 9e, for most of the locations where the predictions have been carriedout, there are decrease in the margin of errors for the VS-based predictions as compared tothe global predictions using ground-station data only. At the locations that are close thecrowdsourced observations, the VS-based prediction technique has achieved up to a 50%gain in efficiency.The disadvantage of VS-based hyper-local analysis is that the model is estimated veryregionally and hence extrapolation of the estimated mean model outside the sample spaceis likely to give misleading and inefficient predictions. For example, in Figure 9b there arelocations with elevations of more than 500 meters, while the maximum elevation in thecrowdsourced sample is 350 meters. This leads to poor predictions (e.g., ambient temper-ature less than 50 ◦ F) at some locations as can be seen in Figure 9e. Note that, though inthose regions the efficiency of VS-based predictions fall short, the residual kriging variance(Figure 9c and 9d) for the VS-based kriging predictor is still less than the global krigingwith NOAA data only. So, the loss in efficiency in VS-based predictions is solely due tothe the extrapolation of the hyper-locally estimated mean function at points outside thecovariate sample space.We conduct a similar analysis for another hyper-local region close to Brooklyn, NY andplot the results in Figure 10. The prediction surface in Figure 10c is obtained by usingstandard methodology on 120 ground-station observations over the east-coast; and thesuface in Figure 10d is generated through VS-based hyper-local analysis of the crowd-sourced observations in Figure 10b. Comparing these two prediction surfaces, we again seethat the regional variation is prominent for the prediction surface obtained from VS-vasedhyper-local analysis where as, the global analysis generates a surface that is too smooth toaccurately capture local variations. In Figure 10f, the advantage of having crowdsourceddata for hyper-local prediction of the process is visible, as the kriging variance of the VS-based methodology is much smaller compared to Figure 10e, especially in locations closeto the crowdsourced observations. In Figure 10g, we see up to 33% gain in margin of er-ror by implementing the VS-based methodology on the crowdsourced data in locationsclose to the crowdsourced observations. Similar to the previous analysis of the Los Angelesdata, the advantage of the VS-based hyper-local predictions is lost if the predictions areattempted at locations too far from the crowdsourced observations or at locations withelevations outside the range of crowdsourced sample.In addition to the VS-based hyper-local analysis, we have also conducted the analysis forboth of the hyper-local regions in Los Angeles and Brooklyn with the standard approachwithout considering the veracity of the crowdsourced observations and then compared thepredictions with the global prediction surface obtained using reference data only. Com-paring the plots in Figure 11 with Figure 9e and Figure 10g we can see that, in both LosAngeles and Brooklyn, the margins of error for the predictions using the standard approachare larger in all the locations as compared to the global predictions using ground-station
STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA Temp(deg F)
Brooklyn NOAA Data (a) lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllllll ll llllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllllll ll llllllllllllllllllllllll
Temp(deg F)
Brooklyn Crowdsourced Data (b) llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll ll lllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll
Temp(deg F)
Pred. Surface w NOAA data only: Brooklyn (c) llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll ll lllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll
Temp(deg F)
VS−based Pred. Surface: Brooklyn (d) llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll ll lllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll
Krig. var.
Kriging Var w NOAA data only: Brooklyn (e) llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll ll lllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll
Krig. var.
VS−based Kriging Var: Brooklyn (f) llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll ll lllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll % Inc. in M.E. % Increase in VS−based Mar. of Err.: Brooklyn (g)
Fig 10: (a) Ground-station observations in the selected hyper-local region; (b) Crowdsourced obser-vations in the same region (c) Prediction surface obtained by standard analysis of NOAA ground-station data; (d) Prediction surface obtained by the VS-based technique on the crowdsourced data;(e) Residual kriging variance for predictions using NOAA data only; (f) Residual kriging variancesfor the predictions using the crowdsourced data; (g) Percent increase in the margin of error for theVS-based predictions compared to the predictions with NOAA data8
A. CHAKRABORTY ET AL. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll % Inc. in M.E. % Increase in Std. Pred. Mar. of Err.: LA llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll ll lllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllll l llllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllll llll lllll lllllllllllllllllllllllllllll ll lll lllllllllllllllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllll % Inc. in M.E. % Increase in Std. Pred. Mar. of Err.: Brooklyn
Fig 11:
The increase in margin of error for the standard approach in hyper-local regions in LosAngeles (left) and Brooklyn (right). data. In Brooklyn, even at the locations around the crowdsourced observations, with ref-erence to the global prediction using ground-station data, the margin of error of standardpredictions using the crowdsourced observations have increased by at the least 120%, whereas, as we have mentioned already, the VS-based methodology has achieved a decrease inthe margin of error up to 33% (Figure 10g). Clearly, no gain from the ‘hyper-local’ analysisis achieved, as compared to the ‘global’ prediction from the ground-station data, unless therobust VS-based methodology is employed on the varying-quality crowdsourced data.5.2.
Validation at the ground-stations.
The goal of the analysis in this section is to validatethe predictions obtained by hyper-local analysis of crowdsourced data using VS-basedmethodology. To do so, we have selected a set of 14 ground-stations that satisfy the followingcriteria: (1) there are at least 30 crowdsourced data points available nearby with at least 20observations with a VS greater or equal to 0 .
4; (2) the elevation of those stations is not toofar from the range of the local crowdsourced samples. We have conducted 14 hyper-localanalyses, as described in Section 5.1, for hyper-local structure exploration of the ambienttemperature and then predicted at those selected ground-station locations to validate theVS-based predictions. We have omitted these 14 stations before-hand so that these are notused in defining the ‘benchmark’ value at the crowdsourced data locations to compute VS;this way the validation data has no effect on the training phase of the predictions. We havealso conducted the same hyper-local analyses using the standard technique without takingthe quality of the observations into account. The results are compiled in Table 5. Theadvantage of using the VS-based techniques as compared to the standard methodology isclear from the results. The RMSPE of the VS-based predictor for these 14 ground-stationsis 3.71, while for the standard approach it is 4.54. More importantly, the average margin oferror (at 95% confidence) for standard predictor is 13.61 and for the VS-based methodologyit is 6.28. Relative to the standard methodology, on average, the VS-based technique hasachieved approximately 54% gain in efficiency of the predictions.
STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA STATION NAME Target Temp. PredTemp.VS VS.ME PredTemp.Std Std.ME
CHICAGO OHARE INTERNATIONAL AIRPORT IL US 76 76.01 6.22 76.75 8.74WASHINGTON DULLES INTERNATIONAL AIRPORT VA US 79 82.80 5.13 75.33 18.35WASHINGTON REAGAN NATIONAL AIRPORT VA US 80 81.95 7.77 75.52 31.64MIAMI INTERNATIONAL AIRPORT FL US 79 77.81 0.50 78.57 1.23LITTLE TUJUNGA CALIFORNIA CA US 68 64.78 6.01 63.74 7.41LOS ANGELES INTERNATIONAL AIRPORT CA US 68 68.91 3.31 67.87 4.26BEVERLY HILLS CALIFORNIA CA US 70 67.94 6.27 68.12 7.54TOLEDO EXPRESS AIRPORT OH US 75 79.45 5.72 79.39 8.18DETROIT METROPOLITAN AIRPORT MI US 76 78.79 6.66 80.74 9.73MINNEAPOLIS ST PAUL INTERNATIONAL AIRPORT MN US 70 77.03 6.67 76.97 10.96CARLOS AVERY MINNESOTA MN US 69 73.33 11.46 74.65 19.05JFK INTERNATIONAL AIRPORT NY US 72 78.24 3.06 80.82 3.82ISLIP LI MACARTHUR AIRPORT NY US 74 75.10 6.29 75.61 8.79AUSTIN BERGSTROM INTERNATIONAL AIRPORT TX US 81 78.87 2.24 86.50 10.19
Table 5:
Predictions using both the VS-based and standard approach at the ground-stations withcrowdsourced observations in proximity.
6. Summary and Conclusions.
In this paper, we have introduced the veracity score toassess the quality of observations in geostatistical setting. The VS is defined by comparingthe varying quality observations with a benchmark. We used the ground-station data asour reference to define the benchmark values in the case studies. The similar scoring ap-proach to assess the veracity of the observations can be used in other contexts as well. Wehave also discussed the case when no other reference information is available and propose aversion of VS using locally and robustly estimated measure of center as the benchmark. Arobust approach for modeling varying-quality spatial data using the VS has been proposedand evaluated. We have illustrated the VS-based methodology on a crowdsourced data setcoming from the mobile app WeatherSignal using NOAA ground-station data as the refer-ence. Both the simulation studies in Section 4 and the case studies in Section 5.1 show theadvantages of the VS-based methodology over the standard geostatistical approach whendealing with noisy spatial data. In addition, by implementing the VS-based methodology onthe varying-quality local crowdsourced data, we can achieve a more accurate and efficienthyper-local predictions as compared to the global prediction obtained from the analysis ofground-station data only.In the analysis of crowdsourced data using the VS-based methodology, the model is esti-mated using observations in a hyper-local region. Predicting at more distant locations orwith covariates outside the range of the sample may provide misleading predictions, as wehave seen for some of the locations in Figure 9b and Figure 10d. The mean and covariancemodels used to explore the structures of the average temperature process are quite simple,yet reasonable and effective for hyper-local analysis of ambient temperature. More complexmodels like nonlinear regression models (Frei, 2014) and anisotropic covariance (Haskard,2007) can be incorporated in the VS-based technique to increase flexibility of the analysis.The VS-based kriging automatically reduces the impact of the corrupted observations and A. CHAKRABORTY ET AL. thus, it does not require removing the outliers manually (e.g. see Frei 2014) – which isoften not feasible when dealing with large crowdsourced spatial data. In addition, as theveracity of the observations has been measured non-parametrically using ‘local’ summaries,the proposed VS-based kriging does not require any distributional assumption (e.g. Gaus-sian, see Lussana, Uboldi and Salvati 2010) on the underlying spatial process or the noiseassociated with it. The analysis presented in this paper shows that the systematic incor-poration of VS in the geostatistical analysis helps us capture the local variability of theambient temperature field by considering crowdsourced data in hyper-local regions. TheVS-based kriging decreases the margin of prediction errors up to 50% as compared to theglobal predictions from ground-station data only. On the other hand, if the same analysisis carried out on the noisy crowdsourced data with standard kriging, there is no gain inefficiency. In fact, there are locations, even close to the crowdsourced observations, wherethe margin of prediction errors by standard methods are more than 80% higher than thecorresponding global predictions.There are several interesting future directions for this work. First, we have not providedtheoretical justification for the superiority of the VS-based methodology as compared to thestandard approach in the analysis of noisy spatial data. Inspired by the simulations executedin this work, we believe that under a suitable spatial asymptotic framework (e.g. mixed-increasing domain , Hall and Patil 1994; Lahiri, Lee and Cressie 2002) and a fairly generalnon-stationary noise model (e.g. the additive-multiplicative model defined in Equation 2.3),we can theoretically justify the robustness and efficiency of the VS-based methodology(for details, see Chakraborty and Lahiri 2019). Second, the methodology discussed in thisarticle can be systemically extended to develop a more sophisticated VS-based krigingtechnique that incorporates both the ground-station data and the crowdsourced data forspatial prediction. Third, a spatio-temporal VS and corresponding methods for real-timecrowdsourced data can be developed by considering neighborhoods in both space and time.
Acknowledgements.
This material is based upon work supported in whole or in partwith funding from the Laboratory for Analytic Sciences (LAS). This research is also par-tially funded by National Science Foundation (NSF) grant DMS-1613192. Any opinions,findings, conclusions, or recommendations expressed in this material are those of the au-thor(s) and do not necessarily reflect the views of the LAS and/or any agency or entityof the United States Government. Special thanks are extended to the OpenSignal teamthat was in-charge of the academic partnership program in 2015 for making the data avail-able to the authors. The authors also thank the Editor, the Associated Editor and threeanonymous referees for a number of thoughtful comments that significantly improved thepaper.
STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA SUPPLEMENTARY MATERIAL
Supplement A: Supplementary Material for Spatial Analysis of Noisy Crowd-sourced Mobile Data (). This file contains additional details on data preprocessing, the simulations, and thecase study. It contains additional plots, tables and discussions to support our claims andfindings in the main article. A. CHAKRABORTY ET AL.
References.
Abramowitz, M. and
Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs,and Mathematical Tables, 9th printing. Dover Publication, New York.
AccuWeather (2015). AccuWeather launches AccUcast, providing exclusive crowdsourced weather featureworldwide. . Accessed: 2019-01-30.
Allahbakhsh, M. , Benatallah, B. , Ignjatovic, A. , Motahari-Nezhad, H. R. , Bertino, E. and
Dustdar, S. (2013). Quality Control in Crowdsourcing Systems: Issues and Directions.
IEEE InternetComputing Chakraborty, A. and
Lahiri, S. N. (2019). On Statistical Properties of A Veracity Scoring Method forSpatial Data. submitted to
Journal of the Royal Statistical Society: Series B . Cressie, N. (1993). Statistics for Spatial Data.
Wiley series in probability and mathematical statistics,
JohnWiley & Sons, Inc.
Cressie, N. and
Douglas, H. M. (1980). Robust estimation of the variogram: I.
Journal of the Interna-tional Association for Mathematical Geology Dalton, A. (2016). Dark Sky’s hyperlocal weather app is now available on the web. . Accessed: 2019-01-30.
Florio, E. N. , Lele, S. R. , Chang, Y. C. , Sterner, R. and
Glass, G. E. (2004). Integrating AVHRRsatellite data and NOAA ground observations to predict surface air temperature: a statistical approach.
International Journal of Remote Sensing Frei, C. (2014). Interpolation of temperature in a mountainous region using nonlinear profiles and non-Euclidean distances.
International Journal of Climatology Gandin, L. S. (1988). Complex quality control of meteorological observations.
Monthly Weather Review
Gelfand, A. E. . , Diggle, P. J. . , Fuentes, M. and
Guttorp, P. (2010). Handbook of Spatial Statistics.
Chapman & Hall/CRC Handbooks of Modern Statistical Methods,
CRC Press.
Genton, M. G. (1998). Highly robust variogram estimation.
Mathematical Geology Ghosh, J. K. (1971). A New Proof of the Bahadur Representation of Quantiles and an Application.
Annalsof Mathematical Statistics Gneiting, T. (2013). Strictly and non-strictly positive definite functions on spheres.
Bernoulli Hall, P. and
Patil, P. (1994). Properties of nonparametric estimators of autocovariance for stationaryrandom fields.
Probability Theory Related Fields Harris, P. , Brunsdon, C. , Charlton, M. , Juggins, S. and
Clarke, A. (2014). Multivariate spatialoutlier detection using robust geographically weighted methods.
Mathematical Geosciences Haskard, K. A. (2007). An anisotropic Mat´ern spatial covariance model: REML estimation and properties,PhD thesis, University of Adelaide.
Huber, P. J. and
Ronchetti, E. M. (2009). Robust Statistics.
Wiley sereis in probability and statistics,
John Wiley & Sons, Inc.
Koller, M. and
Stahel, W. A. (2011). Sharpening Wald-type inference in robust regression for smallsamples.
Computational Statistics & Data Analysis K¨unsch, H. R. , Papritz, A. , Schwierz, C. and
Stahel, A. W. (2011). Robust estimation of the externaldrift and the variogram of spatial data.
ISI 58 th World Statistics Congress of the International StatisticalInstitute
Dublin, Ireland
Aug 21-26.
Lahiri, S. N. (2003). Resampling methods for dependent data. Springer, New York.
Lahiri, S. N. , Lee, Y. and
Cressie, N. (2002). On asymptotic distribution and asymptotic efficiency ofleast squares estimators of spatial variogram parameters.
Journal of Statistical Planning and Inference
Lark, R. M. (2000). A comparison of some robust estimators of the variogram for use in soil survey.
European journal of soil science Lorenc, A. C. (1986). Analysis methods for numerical weather prediction.
Quarterly Journal of the RoyalMeteorological Society
Lussana, C. , Uboldi, F. and
Salvati, M. R. (2010). A spatial consistency test for surface observationsfrom mesoscale meteorological networks.
Quarterly Journal of the Royal Meteorological Society
Matheron, G. (1962). Trait´e de g´eostatistique appliqu´ee, Tome I. Paris: Technip.
Moynihan, T. (2015). Clever app turns everyone into a roving weather reporter. . Accessed: 2019-01-30.
Papritz, A. (2018a). Tutorial and Manual for Geostatistical Analyses with the R package georob. https://cran.r-project.org/web/packages/georob/vignettes/georob_vignette.pdf . Accessed: 2019-02-12.
Papritz, A. (2018b). georob: Robust geostatistical analysis of spatial data, R package version 0.3-7.
Sen, P. K. (1968). Asymptotic normality of sample quantiles of m -dependent processes. Annals of Mathe-matical Statistics Sosko, S. and
Dalyot, S. (2017). Crowdsourcing User-Generated Mobile Sensor Weather Data for Den-sifying Static Geosensor Networks.
ISPRS International Journal of Geo-Information Sun, S. and
Lahiri, S. N. (2006). Bootstrapping the sample quantile of a weakly dependent sequence.
Sankhya: The Indian Journal of Statistics (2003-2007) Thornton, P. E. , Running, S. W. and
White, M. A. (1997). Generating surfaces of daily meteorologicalvariables over large regions of complex terrain.
Journal of hydrology
Todorov, V. and
Filzmoser, P. (2009). An object-oriented framework for robust multivariate analysis.
Journal of Statistical Software Vancutsem, C. , Ceccato, P. , Dinku, T. and
Connor, S. J. (2010). Evaluation of MODIS land surfacetemperature data to estimate air temperature in different ecosystems over Africa.
Remote Sensing ofEnvironment
Willet, J. B. and
Singer, J. D. (1988). Another cautionary note about R : Its use in weighted least-squareregression analysis. The American Statistician5109 SAS Hall2311 Stinson Dr.Raleigh, NC 27695-8203E-mail: