Prototyping Low-Cost Automatic Weather Stations for Natural Disaster Monitoring
Gabriel Francisco Lorençon Ribeiro Bernardes, Rogério Ishibashi, André Aparecido de Souza Ivo, Valério Rosset, Bruno Yuji Lino Kimura
AArticle
Prototyping Low-Cost Automatic Weather Stations for NaturalDisaster Monitoring
Gabriel Francisco Lorençon Ribeiro Bernardes , Rogério Ishibashi , André Aparecido de Souza Ivo , ValérioRosset , and Bruno Yuji Lino Kimura * Received: 01 December 2020Accepted: 21 January 2021Published: to be available Institute of Science and Technology of the Federal University of São Paulo (UNIFESP), São José dos Campos, SP,CEP 12247-014, Brazil; [email protected] (G.F.L.R.B.); [email protected] (V.R.) Brazilian National Center for Monitoring and Early Warnings of Natural Disasters (Cemaden), São José dosCampos, SP, CEP 12247-016, Brazil; [email protected] (R.I.); [email protected] (A.I.) * Correspondence: [email protected]
Abstract:
Weather events put human lives at risk mostly when people might reside in areas susceptible tonatural disasters. Weather monitoring is a pivotal task that is accomplished in vulnerable areas with thesupport of reliable weather stations. Such stations are front-end equipment typically mounted on a fixedmast structure with a set of digital and magnetic weather sensors connected to a datalogger. While remotesensing from a number of stations is paramount, the cost of professional weather instruments is extremelyhigh. This imposes a challenge for large-scale deployment and maintenance of weather stations for broadnatural disaster monitoring. To address this problem, in this paper, we validate the hypothesis that aLow-Cost Automatic Weather Station system (LCAWS) entirely developed from commercial-off-the-shelfand open-source IoT technologies is able to provide data as reliable as a Professional Weather Station(PWS) of reference for natural disaster monitoring. To achieve data reliability, we propose an intelligentsensor calibration method to correct weather parameters. From the experimental results of a 30-dayuninterrupted observation period, we show that the results of the calibrated LCAWS sensors haveno statistically significant differences with the PWS’s results. Together with The Brazilian NationalCenter for Monitoring and Early Warning of Natural Disasters (Cemaden), LCAWS has opened newopportunities towards reducing maintenance cost of its weather observational network.
Keywords: low-cost automatic weather station; natural disaster; intelligent sensor calibration; internetof things
1. Introduction
Weather monitoring is a crucial task in different domains of applications, e.g., highprecision agriculture [1], military missions [2], outdoor entertainment and recreation [3],industrial production, and logistics. One of the most critical applications is natural disastermonitoring. Climate change has intensified the occurrence of natural disasters around theworld [4]. More intense weather events have been experienced in the last few decades, such ashigher and lower temperatures, intense rains, strong winds in tropical cyclones, and intensifieddroughts [5]. People might be exposed to the consequences of extreme weather events, e.g.,flash flooding in underground drainage galleries, landslides on slopes, river overflow, soiland coastal erosion, infrastructural collapsing of houses and buildings. Besides human losses,climate-related natural disaster events affect the local economy by destroying productivecapital, supply chains, and housing stock [6,7].To warn people under imminent risk of natural disasters, weather data remotely collectedfrom weather stations play a key role. Although crowdsourcing and citizen science basedapproaches [8–10] for disaster management can obtain useful information from volunteers,
Accepted for publication in MDPI Sensors Journal , a r X i v : . [ c s . N I] F e b ccepted for publication in MDPI Sensors Journal , , 0 2 of 33 the main source of accurate and reliable weather data still comes from radar and weatherstations. In order to support risk management and reduce the impacts of natural disastersin Brazil, Cemaden is the government institution responsible for monitoring and providingearly warnings of vulnerable areas in the whole country. At Cemaden, weather stations arethe main monitoring components in a decision-making pipeline for early warnings of naturaldisasters. This means that, while the weather stations themselves cannot directly monitornatural disasters, the role of such stations is essentially collecting reliable weather parametersto feed forecasting models running on Cemaden’s back-end servers. As such, the forecastingmodels are the entities responsible for predicting disasters and then trigger early warnings. To provide effective and efficient large-scale weather monitoring, Cemaden has anobservational network composed of more than five thousand Data Collection Platforms(DCP) deployed in almost a thousand municipalities within the Brazilian territory. A DCP isimplemented by a Professional Weather Station (PWS) which consists of a set of one or moresensors to measure weather events, e.g., rainfall precipitation, wind speed, wind direction,air temperature, relative humidity, and atmospheric pressure. Although a PWS providesprecision and data reliability, its cost can reach tens of thousands of dollars. This impliesthat the deployment and maintenance of PWSs are extremely costly for large-scale weathermonitoring such as Cemaden’s observational network.As an initiative to address cost reduction, in this paper, we present a low-cost automaticweather station system, LCAWS, which we developed from commercial-off-the-shelf andopen-source IoT technologies. The key requirement of LCAWS is providing low-cost weatherinstrumentation that allows precision and data reliability equivalent to the PWSs employed byCemaden to monitor natural disasters across the Brazilian territory. By reducing the costs, it ispossible to execute maintenance in short periods and expand the network coverage, allowingredundancy for a failed equipment.
The main contributions of this work are the following:1.
Design and implementation . We describe a detailed design of an LCAWS, including:the interaction of the embedded electronics and sensor dynamics; the data processing toproduce weather parameters with the main equations; the system software architecturewith the main algorithms at the automatic weather station (client node) and the cloudservices (server node).2.
Weather station validation . We present a consistent experimental methodology tovalidate LCAWS which includes three main steps. First, the weather station deploymenttogether with a PWS of reference. Second, the data acquisition with the same frequencywithin a one-month continuous period of data sampling from measurements of atmo-spheric pressure, air temperature, relative humidity, rain precipitation, wind speed, andwind direction. Third, the data analysis to compare the weather parameters produced bythe stations from different performance indicators.3.
Intelligent sensor calibration and data correction . We apply a robust methodology forcalibrating the sensors in order to correct the weather parameters by means of linearand machine learning regression models. In such a methodology, for each weathersensor, we select the best models from a broad set of candidate regression models.To do so, for each candidate model, we ran experiments exhaustively with differentrandomized train-validation and test datasets by using a 𝑘 -fold cross-validation basedmachine learning pipeline. When applying the best models to correct the LCAWS weatherparameters, they allowed an improvement on R-squared coefficient determination (R2) ccepted for publication in MDPI Sensors Journal , , 0 3 of 33 from 0.93 to 0.99 for digital sensors, and from 0.31 to 0.97 for magnetic sensors. In otherwords, this means a root mean squared error (RMSE) reduction of up to 70% for digitalsensors and and up to 80% for magnetic sensors. Such an intelligent approach allowedan improvement in such a way that there were no significant differences in the weatherparameters produced by the LCAWS and PWS, i.e., T-Test’s p -value greater than 0.05.4. A good candidate solution to reduce costs in natural disaster monitoring . Using arobust methodology, we showed that an LCAWS of a few hundred dollars has thepotential to provide weather data as reliable as a PWS applied for natural disastermonitoring. When validating a reliable LCAWS prototype, new opportunities have beenpresented together with Cemaden to expand its national weather observational networkand reduce the cost of maintenance of its DCPs.
In line with the hardware architecture proposed in this paper, some Arduino-based low-costprototypes especially designed for environmental monitoring are also present in the literature[11–14]. Among them, ground weather stations developed by Sabharwal et al. [11] and Sainiet al. [12] are based on the combination of Arduino Uno and ZigBee technologies. WhereasLockridge et al. [13] designed a Sonde for monitoring marine environments, Strigaro et al. [14]also demonstrated that low-cost weather stations based on COTS IoT devices are accessiblesolutions able to produce data of appropriate quality for natural resource and risk management.Benghanem [15] proposed a wireless data acquisition system (WDAS) and a low-costweather station, whose hardware architecture was based on PIC (Parallel Interface Controller)16F877 microcontroller and a communication module using RF Monolithics TX5002 andRX5002. Likewise, Tenzin et al. [16] developed a low-cost weather station for a smartagriculture application based on PIC24FJ64 microcontroller and a Xbee module, comparingresults with a commercial and more costly station.Shaout et al. [17] present a low-cost weather prototype for measuring wind speed, winddirections, and temperature. These parameters are used in combination with a neural networkin order to estimate a dressing index, i.e., the number of clothes a person should use undercertain conditions. The hardware architecture consists of a Freescale HCS12 family’s Dragon12-Plus2 and a MC9S12DG256 microcontroller (16-bit CPU, 256 KB flash memory, 12 KB RAM,4 KB EEPROM). Kodali and Mandal [18] explore the capabilities of ESP8266 (4 MB RAM,128 KB ROM) to prototype a very low cost weather station to acquire temperature, pressure,light, and rain drop sensors.Different from these related works, in our study, we focus on the context of the naturaldisaster monitoring system which presents stronger operation requirements and high accuracyof gathered data. We validate an LCAWS by comparing results with a PWS of referencefor natural disaster monitoring, applying a consistent experimental methodology in orderto fulfill weather monitoring requirements of Cemaden—a national monitoring center. Inorder to improve the accuracy of collected weather data, we propose and implement as anintelligent calibration system based on linear and machine learning regression models, whichis a contribution that does not appear in related works.The remainder of this paper is organized as follows: Section 2 presents Cemaden’sremarks on the problem we addressed in this paper. Section 3 describes our LCAWS prototype.Section 4 describes the weather data processing for LCAWS. Section 5 describes the weatherstation validation. Section 6 presents the intelligent sensor calibration and data correctionapproach we proposed. Finally, Section 7 brings our main conclusions, as well as the LCAWS’slimitations to be addressed in future work. ccepted for publication in MDPI Sensors Journal , , 0 4 of 33
2. The Brazilian National Center for Monitoring and Early Warning of Natural Disasters
Cemaden is a national center created in 2011 under the Ministry of Science, Technology,and Innovation of the Brazilian government. Besides providing uninterrupted natural disastersmonitoring and early warnings, Cemaden also carries out research and technology innovationto contribute to its monitoring systems in order to reduce the number of victims living inrisk areas around the country [19]. Nowadays, Cemaden is monitoring 958 municipalitiesin the entire Brazilian territory, mostly for geological (landslides) and hydrological (floods)events often caused by intense or sparse rain events [20]. To reach its goals, Cemaden holds anobservational network, currently consisting of 5857 different kinds of DCPs and nine weatherradars. Such a network covers metropolitan areas subject to natural disasters while gatheringmeteorological data in real-time. Cemaden also has numerous partnership agreements toprovide and receive weather information from other observational networks (radars and DCPs),expanding its network for better coverage and redundancy. The coverage map in Figure 1shows the national territory with the priority municipalities overlaid by the distribution of theobservational systems.
Figure 1.
Distribution of Cemaden’s observational network.
Every Cemaden DCP is a Professional Weather Station (PWS) equipped with a tippingbucket rain gauge, and able to accommodate additional sensors of specific groups of naturaldisaster monitoring. Such groups are divided into five DCP/PWS categories:Pluviometric,Hydrological, Geotechnical, Agrometeorological, and Acqua. ccepted for publication in MDPI Sensors Journal , , 0 5 of 33 A pluviometric DCP is used for monitoring natural disasters in general, which are oftentriggered by rain, and it is usually composed of only a rain gauge. A hydrological DCP isused for flood monitoring from a rain gauge, a level radar for river level measurement, anda photography camera. A geotechnical DCP consists of a rain gauge and six soil humiditysensors (up to 3 m) to be used together with a robotic total station to help in the landslidemonitoring. An agrometeorological DCP is used for drought monitoring in the semiarid regionof Brazil. It consists of a rain gauge, a thermo hygrometer for air temperature and relativehumidity, an anemometer for wind speed, a vane for wind direction, four soil temperatureand humidity sensors, a pyranometer, and a net radiometer for solar radiation measurements.An acqua DCP is also used for drought monitoring in the Brazilian semiarid region from arain gauge and two superficial soil humidity sensors.In spite of its importance for governmental decision-making, the expansion and main-tenance of almost six thousands weather instruments in a territory of 8,516,000 km imposefinancial and technological challenges and logistics and communication difficulties andvulnerabilities. Based on historical information and maintenance reports on Cemaden’s DCP network,it was possible to identify the most frequent vulnerabilities that cause service interruption,classify them into types, and determine their impact from the percentage of affected devices.Table 1 presents an analysis of vulnerabilities regarding a 2017 survey.
Table 1.
Frequent vulnerabilities found in Ceamaden’s DCP network.
Vulnerability Type Affected Devices
Rain gauge clogging Physical 45.60%Theft, vandalism, and damage Human 5.37%Battery Technological 4.49%Communication failure Technological 12.05%No internet coverage Technological –Rain gauge clogging is a physical vulnerability that represents the greatest number ofoccurrences, a total of 35,185 occurrences in 2017, which affected 2671 devices. In most cases,the problem of funnel clogging is caused by dirt, falling leaves and small insects that interruptwater flow through the siphon of the tipping bucket rain gauge.Theft, vandalism, and damage are human type vulnerabilities that affected around 5.37%of devices in 2017. In particular, there are many goats raised for milk and meat in the semiaridregion. These animals have a peculiar diet, and it is not unusual to gnaw the DCP’s cables,sensors, and even the holding box.Regarding the technological vulnerabilities, we highlight that the most impactful onesare battery maintenance, communication failures, and the absence of internet coverage access.A battery is a type of commodity item that has a short life cycle and needs to be replacedevery two years. Thus, it is important to have a good schedule for battery replacement inmaintenance campaigns. Communication failures were registered in 706 devices by datatransmission problems with a total of 18,750 cases in 2017. Currently, the DCP communicationis based on GSM/GPRS technologies that uses the mobile telecommunication infrastructure.Thus, there are at least two factors that could degrade signal quality and cause a failure: longdistances between the telecommunication tower and DCP, and obstructions between them,e.g., mountains, hills, buildings, even a strong rain can attenuate radio signals. No internetcoverage is an extension of communication failures. This technological vulnerability represents ccepted for publication in MDPI Sensors Journal , , 0 6 of 33 a limitation to expand or rearrange the observational network, as the telecommunicationtowers typically cover the most populated areas and the coastal zone of the country. It meansthat there are difficulties to install instruments in rural areas (e.g., to monitor drought), top ofmountains, and even other basins where rain could contribute to a natural disaster in a riskarea. Nowadays, Cemaden has not deployed any DCPs in areas with no telecommunicationcoverage. The aforementioned vulnerabilities can directly affect the weather condition analysis atCemaden’s Situational Room and, as consequence, affect the quality of the warning messagessent to Civil Defenses. In order to mitigate the impacts of these vulnerabilities, an effectivepreventive maintenance schedule in short periods is a pivotal measure. However, Cemadenis a governmental center with very limited financial and human resources. Currently, it ispossible to execute a device’s maintenance every eighteen months.In this context, one of its greatest challenges to mitigate the impacts of vulnerabilitiesin its observational network is to execute maintenance within appropriate periods (less thansix months) using the same human resources and available budget. From the vulnerabilityanalysis, we observed that they are not linked to the robustness of hardware, but to the appliedtechnology. Cemaden’s DCPs use expensive imported components (e.g., sensors, powerregulators, solar panels, AC chargers, batteries, dataloggers, network interface controllers,enclosures, among others), which makes maintenance expensive, either preventively orfor replacing failed components. Indeed, low-cost stations would be exposed to the samevulnerabilities in the field. However, an LCAWS with similar functionalities and measurementaccuracy should be a good cost alternative to a PWS, allowing the execution of maintenance inshorter periods and also an expansion of the network coverage, giving redundancy for a failedequipment.The maintenance regards other preventive actions, such as the installation of fencesaround most exposed equipment, partnerships with Civil Defenses for maintenance, infor-mative boards, etc. Communication failures and the absence of internet access coverage arevulnerabilities that require efforts also from third parties such as mobile network operators.To improve connectivity beyond the access provided by network operators, e.g., by deployingan ad-hoc network to connect DCPs and Cemaden’s servers, the DCPs have to be updated.However, besides the need for acquiring new components (i.e., incurring cost of importedcomponents), improving communication capacity would be limited to the DCP manufacturer’ssolutions.
As discussed above, the cost reduction in the maintenance process for a reliable weathermonitoring is a key challenge for Cemaden, regarding its limited financial and human resources.In this context, prototyping Low-Cost Automatic Weather Stations (LCAWS) can representa great opportunity to reduce the maintenance cost composition, since it uses easily foundCommercial-Off-The-Shelf (COTS) components, representing just a fraction of the valuesof actual DCP/PWS components. Besides a significant cost reduction in extreme scales( ≈ ccepted for publication in MDPI Sensors Journal , , 0 7 of 33
3. The Prototype of a Low-Cost Automatic Weather Station
The LCAWS presented in this paper is composed of six main components mounted on astructure of vertical and horizontal axes, as seen in Figure 2a and described in Table 2. Theweather measurements are provided by magnetic and digital sensors. The magnetic sensorsare the wind vane, cup anemometer, and tipping bucket rain gauge, which are implemented bythe components 1, 2, and 3, respectively. The sensors for measuring atmospheric pressure, airtemperature, and relative humidity are implemented by a single digital module that is housedin component 5. Component 4 is a photo-voltaic solar panel responsible for charging a 12 V6 Ah lead-acid battery which powers the weather station. The battery is housed in component6, the datalogger, which is responsible for acquiring, processing, storing, and transmittingthe weather measurements. In the next subsections, we present details on the magnetic anddigital sensors, the datalogger components, as well as the software architecture of the system. ccepted for publication in MDPI Sensors Journal , , 0 8 of 33 ( a ) Weather station components ( b ) Components inside datalogger
Figure 2.
Components of the Low-Cost Automatic Weather Station (LCAWS).
Table 2.
LCAWS components.
Component Description Wind vane: Wind vane magnetic sensor (W.R.F Import. Com. Eletroeletrônicos, LTDA, Colombo, Brazil) (reed switch) of ° resolution (N, NE, E, SE,S, SW, W, NW). Anemometer: Wind speed magnetic sensor (W.R.F Import. Com. Eletroeletrônicos, LTDA, Colombo, Brazil) (reed switch) of one pulse per full rotation of75 mm diameter aluminum cups,147 mm radius to cup center. Rain gauge: Tipping bucket rain gauge with magnetic sensor (W.R.F Import. Com. Eletroeletrônicos, LTDA, Colombo, Brazil) (reed switch) of 0.25 mmprecipitation per pulse.150 mm diameter collector. Solar panel: Yingli YL055P-17b (Yingli Green Energy do Brasil, S.A., Sao Paulo, Brazil) 55 W peak power solar panel used to charge the battery. Sensor housing: Bosch BME280 (Bosch Sensortec, Reutlingen, Germany) combined digital sensors of air temperature, atmospheric pressure, and relativehumidity. Datalogger:.1 GPS receiver: u-blox Neo-6M GPS receiver module (U-blox AG, Thalwil, Switzerland)..2 Charge controller: Generic 12 V/24 V 10 A Charge controller..3 Battery: 12 V 6 Ah AGM lead-acid battery..4 Terminal block: Generic terminal block for cable connections..5 Internal sensors: Bosch BME280..6 Storage: 2 GB MicroSD card..7 Processing & Arduino Mega 2560 (Funduino GmbH, Nordhorn, Germany).Telemetry: Epalsite GPRS Shield V1.0 (SIM900, SimCom Wireless Solutions Co., Ltd, Shanghai, China)..8 Voltage regulator: DC-DC step-down voltage regulator. ccepted for publication in MDPI Sensors Journal , , 0 9 of 33 The wind vane, cup anemometer, and tipping bucket rain gauge are magnetic sensors.Each consists of one or more reed switches and a permanent magnet that is attached to aparticular mechanical device. Reed switches are devices that open or close an electrical circuitwhen influenced by a magnetic field. All reed switches in these specific sensors are of thenormally open type, i.e., they remain open by default until a nearby magnetic field causes themto close. Each mechanical device has a specialized structure designed to interact physicallywith the environment so that movement of its parts causes a permanent magnet to influenceone or more reed switches. By causing a reed switch to allow electricity to pass, the mechanicaldevice’s movement generates electric signals which can consistently and reliably be interpretedas a specific weather phenomenon.Each weather device produces particular movement dynamics. Air flow causes the rotarypart of the wind vane to rotate on a vertical axis until it points to the direction from where thewind is blowing. The permanent magnet fixed to the rotary part moves on top of eight reedswitches equally spaced by 45 ° fixed to the static part of the wind vane so that only one reedswitch is on at any given time. Each of the reed switches is in series with a resistor of a distinctvalue varying uniformly from 10,000 Ω to 80,000 Ω . When that particular circuit is closed, theresistance of the overall circuit is uniquely related to a particular wind direction. The overallcircuit is in series with a fixed reference resistor of 4700 Ω , forming a voltage divider. Thewind direction is thus associated with the voltage measured in the wind vane circuit.The mobile component of the anemometer is composed of a permanent magnet and three75 mm diameter aluminum cups mounted on horizontal arms. Air flow makes this mobilepart rotate on a vertical axis and once per full rotation the permanent magnet triggers a reedswitch which is fixed to the static part. These rotations generate electric pulses that, alongwith a timestamp associated with each pulse, can be used to determine wind speed.The tipping bucket rain gauge consists of a 150 mm diameter funnel where rainwatercollects and drips down to a tipping bucket. The tipping buckets are the only moving partsand store water in one of two buckets. When one bucket accumulates a specific volume ofwater, it tips over, emptying itself and exposing the other one. Once that other bucket is full,it tips over and repeats the process. A permanent magnet placed between the two bucketstriggers one of two reed switches, creating a pulse that is interpreted as a precise amount ofprecipitation that has occurred since the last pulse. Component 5 is a sensor housing which accommodates and protects three digital sensors:air temperature, atmospheric pressure, and relative humidity. Such sensors are combined intoa single Bosch BME280 [21] module. While providing low power consumption (3.6 µ A at 1 Hzmeasurement), high accuracy and resolution, the BME280 module encapsulates the sensors intoan electronic component of small dimensions (11.5 ×
15 mm).Different settings of oversampling and filtering allow tailoring data rate, noise reduction,response time, and energy consumption. An impulse response filter (IIR) can remove shortfluctuations in data sampling. The small dimensions and versatile features allow sensor imple-mentation in small devices such as handsets, GPS modules, watches, in different applications,e.g., home automation, indoor navigation, fitness, and GPS refinement. Differently, in thispaper, we apply the BME280 to implement an automatic weather station.
Component 6 is the datalogger, which is a key instrument that contains all the circuitryresponsible for sensor data acquisition, processing, and storage, for telemetry, and for powering ccepted for publication in MDPI Sensors Journal , , 0 10 of 33 the weather station. The datalogger is composed of eight main internal components, as seenon Figure 2b:1. GPS receiver . Aside from useful to synchronize time and localization, the GPS receiverprovides a source of data for further investigations, e.g., correlation between signal-to-noise ratio (SNR) from different satellites and the weather measurements.2.
Charge controller . It is a device that regulates the highly variable voltage and currentsupplied by a solar panel and charges a battery following a charge curve. The chargecontroller used in our prototype is a generic 10 A max 12/24 V pulse width modulation(PWM) controller.3.
Battery . The weather station is powered by a 12 V 6 Ah lead-acid absorbed glass mat(AGM) motorcycle battery.4.
Terminal block . A 10-pin terminal block (component 6.4) is used to facilitate maintenanceand secure stable electrical connections between sensors and the microcontroller.5.
Internal weather sensors . A Bosch BME280 sensor module is installed inside thedatalogger housing to monitor internal temperature and humidity.6.
Storage . Data is stored in a 2 GB microSD card.7.
Processing and telemetry . The Arduino Mega 2560 is the microcontroller used to acquire,process, store, and transmit data from sensors. Coupled with a SIM900 GPRS shield, itsends the weather data automatically to a cloud remote server.8.
Voltage regulator . An LM2596S DC-DC step-down voltage regulator chip, whichconverts 10–14 V inputs from the battery to a fixed output at 7 V to the Arduino.
The LCAWS system software is based on a client–server architecture, as shown in Figure 3.The automatic weather station (AWS) is composed of a client program and routines deployedon the datalogger. On the server side, the cloud services (CS) are provided by programs androutines for processing and data corrections deployed on a cloud. The rationale of AWS andCS operations are summarized in Algorithms 1 and 2, respectively. *DB L cacheclientpool of sensors S get_sensordata( S )tuple τ write_file( f , τ ) Cloud Services (CS)Automatic Weather Station (AWS) send_to(server, f )APATRHRGWSWD serverappend(DB, f )recv_from(client, f ) learning_pipeline L learner model l * DB L dataset d τ s u mm a r y process_datasubset(DB, i ≤ t ts ≤ i + g ) DB L data_correctionbest model l * get datasetto learn O Inteligent Sensor Calibrataion get datasetto correct learningtechniques D sk D sj Internet DB DB P *D sk Figure 3.
LCAWS system software architecture. ccepted for publication in MDPI Sensors Journal , , 0 11 of 33 The client program is implemented with the C++ Arduino API [22] in order to run on themicrocontroller deployed on the weather station. The weather sensors are gathered into a poolof 𝑆 sensors: atmospheric pressure (AP), air temperature (AT), relative humidity (RH), raingauge (RG), wind speed (WS), and wind direction (WD). The client program runs continuouslyand collects the sensor data once a minute, hence there is a standby time 𝑡 s between each dataacquisition cycle. The generic procedure get_sensordata() reads the sensors and returns thefollowing 7-tuple: 𝜏 = { AP, AT, RH, RG, WS, WD, 𝑡 ts } (1)where 𝑡 ts is the UTC timestamp of each data acquisition cycle on the pool of sensors. Insidethe get_sensordata() procedure, the read_sensor() procedure reads out the value availablefor each sensor 𝑠 . Algorithm 1:
AWS operation.
Input: 𝑁 𝜏 = 10, tuples 𝜏 per file. 𝑡 s = 1 min, a sampling standby.server = IP-port, address info. program client( 𝑁 𝜏 , 𝑡 s , server ) : init_modules() 𝑆 = init_sensors() while true do 𝑓 = create_file() foreach 𝑁 𝜏 do 𝜏 = get_sensordata( 𝑆 )write_file( 𝑓 , 𝜏 ) ) standby( 𝑡 s ) end send_to( server , 𝑓 )close( 𝑓 ) endprocedure get_sensordata( 𝑆 ) : 𝜏 = null foreach 𝑠 ∈ 𝑆 do 𝑣 = read_sensor( 𝑠 ) 𝜏 = cat( 𝜏 , 𝑣 ) end 𝑡 ts = get_timestamp() 𝜏 = cat( 𝜏 , 𝑡 ts ) return 𝜏 ccepted for publication in MDPI Sensors Journal , , 0 12 of 33 Algorithm 2:
CS operation.
Input: DB 𝐿 = null , an initialized 𝜏 database. DB 𝐿 = null , an initialized 𝜏 database. 𝑡 , UTC starting time for data processing. 𝑖 = 1 hour, time interval from 𝑖 . program server() :while true do 𝑓 = create_file()recv_from( client , 𝑓 )append(DB 𝐿 , 𝑓 )close( 𝑓 ) endprogram process_data(DB 𝐿 , DB 𝐿 , 𝑖 , 𝑔 ) :while not end_of(DB 𝐿 ) do 𝑑 = subset(DB 𝐿 , 𝑡 ≤ 𝑡 ts < 𝑡 + 𝑖 )AP = summary( 𝑑 AP ) /* Equation (3) */ AT = summary( 𝑑 AT ) /* Equation (3) */ RH = summary( 𝑑 RH ) /* Equation (3) */ RG = summary( 𝑑 RG ) /* Equation (4) */ WS = summary( 𝑑 WS ) /* Equation (10) */ WD = summary( 𝑑 WD ) /* Equation (13) */ 𝜏 = cat(AP, T, RH, P, WS, WD)append(DB 𝐿 , 𝜏 ) 𝑡 = 𝑡 + 𝑖 end A file 𝑓 is created to store a dataset of 𝑁 𝜏 =
10 tuples 𝜏 in a cache on the local file system.The file 𝑓 is sent to the server via TCP/IP over the mobile operator network under configurabledata frequencies. For the sake of real-time weather monitoring, the periodic communicationcan be accomplished at the rate of the data acquisition from the sensors, i.e., the sendingof sensor data tuple every minute cycle as soon as it is measured. This sampling frequencyshould be enough, since it is higher than the one configured in Cemaden’s DCPs on the field,which is one sample every 10 min to monitor natural disasters. If the internet connection iscompletely interrupted, the data continue being buffered in the local files and is transmittedwhen the internet connection is reestablished. If a connection is unstable and packet dropoccurs, the TCP connection between the AWS client and the CS server is responsible forproviding the reliable transmission required to consistently send the weather data cachedin local files. In our proof-of-concept prototype for automatic data acquisition, the datatransmission is implemented with a GPRS shield connected to the Arduino microcontroller.Nevertheless, once the LCAWS design is a COTS-based modular architecture, there is flexibilityto apply other up to date deployable long-range IoT data transmission technologies [23,24],e.g., LoRA [25], SigFox [26], and NB-IoT [27].On the remote side, cloud services are provided from a server program and a programfor data processing. The server is implemented in Python and runs continuously in orderto receive the file 𝑓 and append 𝑁 𝜏 tuples of raw data values into a local database DB .Asynchronously, process_data is another program implemented in R Language, whichprocesses weather measurements in raw values and produces the corresponding weatherparameters. To do so, a subset of raw data 𝑑 in DB 𝐿 is selected regarding 𝑡 ts , which is themeasurement UTC timestamp. Since the weather parameters are usually summarized instatistics per hour, the dataset 𝑑 considers an interval [ 𝑡 , 𝑡 + 𝑖 − ] , where 𝑡 is a given UTCstarting time, and 𝑖 =
60 min is the dataset time interval. For each 1 h subset 𝑑 , we determinethe weather parameter for each sensor 𝑠 ∈ 𝑆 from the procedure summary() . As highlighted ccepted for publication in MDPI Sensors Journal , , 0 13 of 33 in Algorithm 2, such a procedure applies equations to give the parameters across the differenttypes of weather data. These equations are discussed in the next section.As a service CS at the server side, we have the intelligent sensor calibration programs. Theprocessed LCAWS weather data stored in DB 𝐿 and the ones in DB 𝑃 are merged into a generaldatabase DB . The database DB 𝑃 stores the corresponding weather parameters obtained from aprofessional weather station of reference. As such, a dataset D 𝑠𝑗 ∈ DB of a sensor 𝑠 ∈ 𝑆 consistsof the weather parameters of both low-cost and professional weather stations, regarding a 𝑗 period of observation. Given D 𝑠𝑗 and a set 𝐿 of candidate machine learning-based regressiontechniques, the procedure learning_pipeline is responsible for constructing the learnermodel 𝑙 ∗ to fit D 𝑠𝑗 through a 𝑘 -fold cross-validation learning pipeline. The resulting learners 𝑙 ∗ (i.e., the fitted regression models) are stored into a set 𝑂 together with other machine learningoutput material. When having a number of learner models available in 𝑂 , the procedure data_correction takes the best one 𝑙 ∗ for a target sensor 𝑠 in order to correct a given dataset D 𝑠𝑘 of a 𝑘 observation period from the LCAWS’s database DB 𝐿 . The resulting dataset D ∗ 𝑠𝑘 containing the corrected weather parameters is thus stored in a final database DB ∗ 𝐿 .The proposed intelligent sensor calibration and data correction is expected to be acontinuous process. This means that, once we have a set of fitted models in 𝑂 , the regressionmodels can be applied to correct data as soon as new weather parameters are available in DB 𝐿 . Meanwhile, the set 𝑂 should not be a permanent repository of regression models, sothat the procedure learning_pipeline has to be run whenever new ground-truth values areavailable in DB 𝑃 . In Section 6, we describe details of the implementation and methodology theproposed calibration and data corrections, as well as the discussion of results we obtainedfrom experiments conducted exhaustively.
4. Weather Sensor Data Processing
In this section, we describe the equations we implemented throughout the procedure summary() to process the different sensor data and obtain the desired weather parameters.
Atmospheric pressure (AP), air temperature (AT), and relative humidity (RH) are mea-sured by the Bosch BME280 digital sensor kept in the sensor housing. The BME280 allowsfor accomplishing a period of measurement with configurable oversampling. After such ameasurement, an infinite impulse response filter (IIR) can be applied to AP and AT values inorder to increase the output signal resolution from 16 to 20 bits while reducing bandwidthand removing short-term fluctuations. RH measurement does not fluctuate rapidly, thus itrequires no filtering. BME280s IIR is given by a low-pass filter: 𝑥 ∗ 𝑖 = 𝑥 ∗ 𝑖 − × ( 𝑘 − ) + 𝑥 𝑖 𝑘 , (2)where 𝑥 𝑖 is the observed measurement from the ADC output data; 𝑥 ∗ 𝑖 − is the previousmeasurement filtered; and 2 𝑘 is a filter coefficient with a factor 𝑘 = [
0, 4 ] .The LCAWS oversampling setting is of 16 × measurements with filter disabled (i.e., 𝑘 = = 𝑁 𝑁 (cid:213) 𝑖 = 𝑥 ∗ 𝑖 , (3) ccepted for publication in MDPI Sensors Journal , , 0 14 of 33 where 𝑁 is the sample size of measurements obtained in a period of 1 h observation; and 𝑥 ∗ 𝑖 isan 𝑖 measurement filtered with Equation (2). The LCAWS pluviometer is a tipping bucket rain gauge, thus it counts cumulatively thenumber of magnetic pulses whenever the bucket tips due to the rainfall precipitation. Fromthe number of pulses, the correspondent rainfall precipitations in millimeters per hour is:RG = 𝑁 (cid:213) 𝑖 = 𝑓 d ( 𝑝 𝑖 ) × 𝜈 , (4)where 𝑁 is the sample size of measurements obtained in 1 h observation; 𝑝 𝑖 is the number ofpulse clicks accumulated in that measurement; 𝜈 is the amount of precipitation measured byeach pulse generated by the tipping of the LCAWS bucket, which is 𝜈 = 𝑓 d ( 𝑥 ) isthe lagged differences between two consecutive values, 𝑖 and 𝑖 + 𝑥 .The lagged differences are given by: 𝑓 d ( 𝑥 𝑖 ) = (cid:40) 𝑥 𝑖 + + 𝑥 max − 𝑥 𝑖 , if ( 𝑥 𝑖 + − 𝑥 𝑖 ) < 𝑥 𝑖 + − 𝑥 𝑖 , otherwise, (5)where 𝑥 max is the maximum unsigned integer value in which 𝑥 𝑖 can store. The LCAWSclient data collector module assumes 16-bit integer on the Arduino Mega microcontroller, i.e., 𝑥 max = − The raw measurement obtained from the LCAWS anemometer computes cumulativelythe number of revolutions of its cup-mounted arm. To convert such a data into wind speed inmeters per second, 𝑤 , the data processing is accomplished as follows: 𝑤 = 𝐶 × 𝑅 , (6)where 𝐶 = 𝜋 × 𝑜 is the circumference of a revolution for a propeller of 𝑜 diameter; and 𝑅 is the expected amount of revolutions of the propeller observed in a second. The LCAWSanemometer provides revolutions of circumference 𝐶 = 𝑜 = 𝑅 provides the expectednumber of pulses per second, which is given by: 𝑅 = 𝑓 d ( 𝑟 ) 𝑓 d ( 𝑡 ) × 𝑠 , (7)where 𝑓 d ( 𝑥 ) is given by Equation (5); 𝑟 the set of cumulative pulses observed in a 1 h period; 𝑡 is a set of the numbers of cumulative milliseconds passed since LCAWS datalogger startedrunning; and 𝑠 is the time unit in seconds, i.e., 𝑠 = ccepted for publication in MDPI Sensors Journal , , 0 15 of 33 directions require to be mapped into the Cartesian plane of two components, 𝑥 and 𝑦 . Themean of the vector components, ¯ 𝑥 and ¯ 𝑦 , are given by: ¯ 𝑥 = 𝑁 𝑁 (cid:213) 𝑖 (cid:20) − 𝑤 𝑖 × sin (cid:18) 𝜋 × 𝜃 𝑖 (cid:19) (cid:21) , (8) ¯ 𝑦 = 𝑁 𝑁 (cid:213) 𝑖 (cid:20) − 𝑤 𝑖 × cos (cid:18) 𝜋 × 𝜃 𝑖 (cid:19) (cid:21) , (9)where 𝑁 is the sample size of measurements obtained with the weather station in 1-hourobservation; 𝑤 𝑖 is the wind speed in m/s; and 𝜃 𝑖 is the wind direction in degrees. The windspeed 𝑤 𝑖 and LCAWS direction 𝜃 𝑖 are given by Equations (6) and (12), respectively.From the means of vector components 𝑥 and 𝑦 , the mean of wind speed is determinedfrom the resultant vector average: WS = (cid:113) 𝑥 + 𝑦 , (10)where 𝑥 and 𝑦 are the component averages obtained with Equations (8) and (9), respectively. The LCAWS wind vane operates through eight reed switches fixed to the chassis, whichare switched on by a permanent magnet fixed to the rotary part of the vane. This gives thewind vane a resolution of 𝜃 res = ° , corresponding to the Cardinal directions north, northeast,east, southeast, south, southwest, west, and northwest.In the magnetic sensor device, the wind direction corresponds to the current voltage ofthe vane voltage divider in the continuous interval of 0 𝑣 and 5 𝑣 . ADC provides such a voltagein 10-bit integer values, which can be lossless reduced to the space of an 8-bit integer. Thus, thevalue domain is within 𝑉 = [ 𝑣 max ] , where 𝑣 max = −
1. Then, a voltage 𝑣 ∈ 𝑉 is mappedinto one out of nine discrete vane positions 𝑝 ∈ 𝑃 . A mapping function 𝑓 p : 𝑉 → 𝑃 is given by: 𝑓 p ( v ) = (cid:22) 𝑅 ref × (cid:16) v max v − (cid:17)(cid:25) , (11)where 𝑅 ref = Ω is the reference resistor.If 1 ≤ 𝑝 ≤
8, then the vane position 𝑝 can be mapped into a corresponding angle 𝜃 ∈ Θ ,where Θ = {
0, 45, · · · , 315 } is a set of angles spaced in 𝜃 res degrees according to the eightexpected Cardinal directions. A mapping function, 𝑓 𝜃 : 𝑃 → Θ , is given by: 𝑓 𝜃 ( 𝑝 ) = (cid:40) 𝜃 , if 𝑝 = 𝜃 res × ( 𝑝 − ) , otherwise, (12)where 𝜃 = ° is the calibration of a particular position of the vane when 𝑓 p ( 𝑣 ) =
0, in whichthe circuit is spliced so that ACD cannot provide a valid voltage; and 𝜃 res = ° is the angularspace between two intercardinal directions.When determining 𝜃 , the mean of wind direction, WS, of a period of observation isdetermined by: WD = arctan (cid:18) ¯ 𝑥 ¯ 𝑦 (cid:19) × 𝜋 + ¯ 𝑥 and ¯ 𝑦 are the means of the vector wind components obtained from Equations (8)and (9), respectively. ccepted for publication in MDPI Sensors Journal , , 0 16 of 33
5. Weather Station Validation
To validate LCAWS, we applied an experimental methodology of three main steps:weather station deployment, data acquisition, and data analysis. In the next subsections, wediscuss each step.
We deployed LCAWS in February 2019 at the outdoor test environment of the TechnologyPark (PqTec) [28,29] of the city of São José dos Campos, state of São Paulo, southeast Brazil. Inorder to asses the weather measurement results, LCAWS was co-located 3 m apart of a referenceweather station. Both stations were able to catch and measure the same weather events atnear location and time, allowing direct comparison of the measurement results between them.The Professional Weather Station (PWS) of reference we utilized was the Campbell ScientificCR200 Series(Campbell Scientific, Inc, Leicestershire, UK) [30]. Specifications of both stationsare detailed in Table 3. Figure 4 shows the pictures of the deployment location.
Table 3.
Weather stations’ specifications.
LCAWS PWSManufacture:
Multi-suppliers Campbell Scientific
Model:
Beta v3.0 CR200(X)
Maturity level:
Academic/test prototype Professional use
Weather sensors:
AP, AT, RH, RG, WS, WD AP, AT, RH, RG, WS, WD
Control sensors:
GPS, BME280, Battery l. Battery level
API languages:
C++, C, R, Python CRBASIC
A/D converter:
10 bits 12 bits
Max. scan rate:
User program: – PC200W software
Processing:
Arduino Mega 2560 CR200(X) CPU
Memory:
Data storage:
Data format:
C++ primitive data types 4 B per data point
Data retrieval:
USB/RS232, GPRS, microSD RS232, PCCOM
Data correction:
ML-based sensor calibration –
Communication:
USB/RS232, GPRS shield Serial RS232
Temp. range: – − ◦ C to +50 ◦ C Datalogger
Plastic, IP55, Aluminium, NEMA 4X,
Enclosure:
30 cm ×
22 cm ×
12 cm 14 cm × × Current drain:
80 mA avg, 2 A peak (GPRS TX), 12 V 3 mA avg, 75 mA peak, 12 V
Battery:
Power supply:
Solar panel 55 Wp Solar panel 10 Wp
Redundancy: – Battery backup
Protections:
Moisture monitoring EMI, ESD ccepted for publication in MDPI Sensors Journal , , 0 17 of 33 ( a ) Deployment location in the red point: 23 ° ° ( b ) LCAWS in February, 2019. ( c ) PWS in February, 2019.
Figure 4.
Weather station deployment in situ pictures: ( a ) a 300 ft aerial image of the São José dos Campos Technological Park(PqTec)[28] – image adapted from OpenStreetMap contributors [29], with the red point marking the exact location ( − − b ) the Low-Cost Automatic Weather Station (LCAWS); and ( c ) the Professional Weather Station(PWS). ccepted for publication in MDPI Sensors Journal , , 0 18 of 33 We collected the weather measurements from both stations during 30 days of continuousobservations in March 2019, during the rainy season in southeast Brazil. Each stationprovided the measurements once per minute, so that we processed the corresponding weatherparameters into statistics per hour, according to the data processing described in Section 4.Thus, from a large uninterrupted observation period, we obtained samplings of atmosphericpressure (AP), air temperature (AT), relative humidity (RH), rain gauge (RG), wind speed(WS), and wind direction (WD). Figure 5 shows the time-series for each parameter providedby LCAWS and PWS during the 30 days of observations. When interposing the results of thestations, the time-series gives us a description of how each parameter had behaved and howclose the results of LCAWS are with respect to PWS. Since we used PWS as a reference, weassume its results as being the ground-truth to discuss closeness and dissimilarity from theresults produced by LCAWS.For each sensor of each station, we generate a dataset by summarizing statistics (e.g.,minimum, 1st quartile, median, mean, 3rd quartile, maximum, sum, and mode) per hourover the corresponding weather parameter (i.e., over the weather data processed with theequations described in Section 4). Thus, the summarized weather parameter is described inthe domain of non-negative real numbers, where each tuple in the dataset is driven to thecorresponding timestamp 𝑡 ts . The datasets are publicly available and can be found in [31]. ccepted for publication in MDPI Sensors Journal , , 0 19 of 33 AP ( m ba r) A T ( C ) RH ( % ) R G ( mm / h ) W S ( m / s ) W D ( deg r ee )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 309459509552025303540608010005101520012340100200300 Days of observation in March, 2019Professional Weather Station (PWS), Low−Cost Automatic Weather Station (LCAWS)
Figure 5.
Time-series of 30 days of weather observation in March, 2019, from the both stations co-located at the se the Low-Cost Automatic Weather Station (LCAWS) and the ProfessionalWeather Station (PWS) from different sensors: Air Pressure (AP), Air Temperature (AT), Relative Humidity (RH), Rain Gauge (RG), Wind Speed (WS), and Wind Direction (WS). ccepted for publication in MDPI Sensors Journal , , 0 20 of 33 We analyzed the parameters by computing various performance metrics such as Pearson’sCorrelation Coefficient (PCC), R-squared coefficient of determination (R2), and residuals analysisfrom Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). In addition, we applied 𝑡 -tests over the obtained results in order to verify the statistical significance.To correlate the weather parameters in pairs of sensors, we apply PPC:PPC = (cid:205) 𝑁𝑖 = ( 𝑥 𝑖 − ¯ 𝑥 ) × ( 𝑦 𝑖 − ¯ 𝑦 ) (cid:113)(cid:205) 𝑁𝑖 = ( 𝑥 𝑖 − ¯ 𝑥 ) × (cid:113)(cid:205) 𝑁𝑖 = ( 𝑦 𝑖 − ¯ 𝑦 ) , (14)where 𝑥 and 𝑦 are two given values correlated from two sensors, and ¯ 𝑥 and ¯ 𝑦 are their averages,respectively.Considering the PWS results as being the ground-truth, we determine the goodness-of-fitfrom LCAWS to PWS for each pair of equivalent sensors by applying the R2 coefficient,R2 = − (cid:205) 𝑁𝑖 = ( 𝑦 𝑖 − ˆ 𝑦 𝑖 ) (cid:205) 𝑁𝑖 = ( ¯ 𝑦 − 𝑦 𝑖 ) , (15)where 𝑦 𝑖 and ˆ 𝑦 𝑖 are the actual and predicted values, i.e., PWS and LCAWS results, respectively.The best case is R2 =
1, i.e., when the predicted values exactly match the actual ones, so thatthe residual sum of squares is zero.To measure how well the results of LCAWS can fit to PWS, for each pair of equivalentsensors, we analyze the residuals between them from the average of squared errors and squareroot of such an average, respectively,MSE = 𝑁 𝑁 (cid:213) 𝑖 = ( 𝑦 𝑖 − (cid:98) 𝑦 𝑖 ) , (16)RMSE = √ MSE, (17)where 𝑦 𝑖 and ˆ 𝑦 𝑖 are the actual and predicted values, as in Equation (15). While MSE is a riskmetric corresponding to a quadratic score from the average residual magnitude, RMSE is itssquared root, which can be interpreted with the same unit as the measured data.Figure 6 shows the PPC results between pairs of sensors. From the PPC similarity ofinternal sensor pairs in each station, as shown in Figure 6a,b, one can assume that the resultsbetween the stations are coherent. While there was no correlation between AP and the othersensors, RH was strongly and negatively correlated to AT. The other pairs of sensors hadpositive and negative PPC results, however, with no strong coefficients greater than 0.5 orlower than − 𝑡 -tests by assuming the following parameterization: theconfidence interval of 95 %, with the null hypothesis H : 𝑠 L = 𝑠 P , i.e., a datum collected froma given sensor 𝑠 in LCAWS is equal to a corresponding one in PWS. Otherwise, the alternativehypothesis H : 𝑠 L ≠ 𝑠 P . ccepted for publication in MDPI Sensors Journal , , 0 21 of 33 Table 4.
Performance metrics and statistical significance between LCAWS and PWS (30-day observationperiod in March, 2019).
Sensor R2 MSE RMSE Significance t -Value p -Value AP 0.9557 0.2815 0.5305 3.7500 **AT 0.9260 0.9789 0.9894 4.6400 **RH 0.9186 17.3133 4.1609 − − − − p -value ≤ As presented in Table 4, the combined digital sensors in Bosch’s BME280 module couldallow high accuracy, with R2 ≥ p -values of high significance, lower than 0.001 (**). In other words, the nullhypothesis is rejected for the sensors AP, AT, and RH, which means that statistically suchsensors provide different results. In case of accepting H with a tight match between theoutcomes of pairs of sensors, we would have t -value ≈ p -value ≈ ≥ ± p -value was of 0.61 sothat there is no significant difference between the rain gauge measurements of PWD andLCAWS. However, it is important to highlight that the rain gauge time-series are describedby 91.4% and 88.8% of non-rainy hours for LCAWS and PWS, respectively. This unbalancedsampling of rainy and non-rainy measurements also explains the low residuals, even whensensors have different bucket volumes.Among the magnetic sensors, the worst performance metrics came from the wind speed(WS) sensor, with the lowest R2 of 0.34, and residuals of MSE = = ≥ WS ≤ = ccepted for publication in MDPI Sensors Journal , , 0 22 of 33 −1 −0.5 0 0.5 1 AP L A T L RH L R G L W S L W D L AP L AT L RH L RG L WS L WD L ( a ) LCAWS −1 −0.5 0 0.5 1 AP P A T P RH P R G P W S P W D P AP P AT P RH P RG P WS P WD P ( b ) PWS −1 −0.5 0 0.5 1 AP P A T P RH P R G P W S P W D P AP L AT L RH L RG L WS L WD L ( c ) LCAWS vs. PWS
Figure 6.
Pearson correlation coefficient (PCC) of the observed results (30 days of in March, 2019): ( a ) between sensors of LCAWS, ( b ) between sensors of PWS, and ( c ) between sensors ofLCAWS and PWS. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllll l l lllllllllll lllllllllllllllllllllllll l llllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll l lllllllllllllllllllllll l lllllllllllllllllllll llllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllll l l lllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll p−value = 2e−04R2 = 0.9557 lll ll l lllllllllllllllllllll l lllll lllllllllllllllllll l l lllllllllllllllllllllll llllllllllllllllllll lll l llllllllllllllllllll l l l l llllllllllllllllllll l l l l ll lllllllllllllllllll l l l l lll llllllllllllllll l ll l l l llllllllllllllllllll l lllllllllllllllllllllll llll llllllllllllllllllll l l lll lllllllllllllllll l l l l ll lll lllllllllllllllll l l llllllllllllllllllllll l l lllllllllllllllllllll ll l lllllllllllllllllllll ll l l lllllllllllllllllllll l l ll llllllllllllllllll l l l l lllllllllllllllllllll l ll lllllllllllllllllllllllll llllllllllllllllllll lllllllllllllllllllllll l l l l l ll lllllllllllllllll ll l l llllllllllllllllllllll l l l lllllllllllllllllll l l l l lllllllllllllllllllll ll llll llllllllllllllllll ll lllllllllllllllllllll l l ll llll llllllllllllll l p−value = 0R2 = 0.926 lllllll l llllllllllllllllllllllll lll l llllllllllllllllllll l lll l lllllllllllllllllllllll l l lllllllllllllllll l lll l l l lllllllllllllllllll ll llll l llllllllllllllll l ll l llllllllllllllllllll lll ll ll llllllllllllllllllll l ll l lllllllllllllllllll ll ll l lllllllllllllllll lll lllllllllllllllllllllll l ll l llllllllllllllllll lll l l llll lllllllllllllllll l l l llllllllllllllllllll l ll l llll llllllllllllllll l l l llllllllllllllllllll ll l llllllllllllllllllllllll l l ll llllllllllllllllllll l l l llllllllllllllllll ll llllllllllllllllllllllllll llllllllll lllllllll l lll llllllllllllllllllllllll l l llll lllllllllllllllll l l ll l llllllllllllllllllll l l l llllllllllllllllllll l l lllllll llllllllllllll l l l lllllllllllllllllll lllll llllllllllllllllllll lll l llllll lll l ll p−value = 0R2 = 0.9186 llllllll llllllllllllllllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll p−value = 0.6132R2 = 0.939 l ll lll ll llllll llllllll ll lllll l l llllllllllllll llll llllll ll lll l lll ll l lll llllllll ll ll llllll llll llllll l lll llll l llllllllllllllll l ll lll l ll llll lll l lll ll l l llllll llllll lll lll llll lll lll llll lll lllllllllllll l ll lll ll llll l ll llllll lllll l l l l llllll ll ll lllll lllll lll lll lll llllllll lllll llll llllll lllllllllllll l lllll llllll llll ll llll ll lllll l llll llllll lllllll lllll l ll lllllllllllllllll lll lll ll llllllll llll lll l lllllllll ll lll ll llllll l lllll l ll llllll lllllll llllll l l ll llll lllllllllll lll lllll l lllll llllllllll ll ll lll ll lllll lll llll lllll ll lllll ll ll lllllllllllll l l llllll ll l l lllllllllllllll llll ll l llllllllllllll l llllll ll ll lllll llllll ll ll llllll l l llllll lll l ll llll lll lll l l llll l llllllllll ll lll l l lll llllllllllllll l ll lll l l lllllllll lllll p−value = 0R2 = 0.3445 l ll l l lll lll llll lllllllll lll l ll lll lll llll ll ll ll l llll lll lll l llll lllllllllll llll ll llllll lll lll llll llll ll l lll lllll lll llllllll l llll ll l ll lll llllllll llllllllll llll l ll l lll llll l ll llll ll llllll lll l llllll l ll llll lllllll l ll lll lllll l llllllllllll ll ll ll lll llll ll l lllll llllllll l ll ll l ll lll l ll lllll ll l ll llllllllll lllll llllll l ll lll lllllll llllll ll l lll l ll lllll llllll llllllllll ll l lllll llllllll llll lll llllll lllllllll llllllllllllll llll l lll l llll ll llll l l lllll l ll l llllllllll ll lll lllllll l llll llllll l llllllllll l ll lllll lllll lll ll l l llll ll llllll llll lllllllll lllllllllll l ll lllll ll ll lllllll l lll ll l llll lll lllllllll lll lllllllllll lllllll lll llll llll lllll ll lllllllll llll llll lllll lllllll lll l llll l llllll llll l lllllll l lllll l llllll lllllll ll ll p−value = 0.2601R2 = 0.6136 AP (mbar) AT (C) RH (%) RG (mm/h) WS (m/s) WD (degree)946 950 954 958 20 25 30 35 40 60 80 100 0 5 10 15 0.0 0.5 1.0 1.5 2.0 0 100 200 3000100200300012340510152040608010020253035946950954958 Low−Cost Automatic Weather Station (LCAWS) P r o f e ss i ona l W ea t he r S t a t i on ( P W S ) Figure 7.
Scatter plots of 30-day observation period (March, 2019) with the Low-Cost Automatic Weather Station (LCAWS) and the Professional Weather Station (PWS) from differentsensors: Air Pressure (AP), Air Temperature (AT), Relative Humidity (RH), Rain Gauge (RG), Wind Speed (WS), and Wind Direction (WS). ccepted for publication in MDPI Sensors Journal , , 0 23 of 33 When comparing the results of the wind direction sensors (WD) between the stations, a p -value greater than 0.05 accepts the null hypothesis. However, it lies on the higher magnitudeof residuals with MSE = 3567.64 and RMSE = 59.73 (degree), and hence a lower predictabilityof R2 = 0.61. Such an RMSE led to errors around ±
60 degrees in a space of 0 ≤ WS < 𝑡 -tests. In order toreduce residuals while enabling operations with no statistical difference between the stations,we propose to extend the experimental methodology by applying linear and machine learningregression models in order to calibrate the LCAWS weather parameters. In the next section,we discuss experimental results obtained from such a proposal.
6. Calibrating Weather Sensors with Linear and Machine Learning-BasedRegression Models
Assuming the PWD results as being the expected values, i.e., ground-truth, we appliedregression methods in order to reduce the LCAWS residuals. As described in Table 5, otherworks applied regression methods to accomplish sensor calibration or data inference. Theauthors observed promising results from well-known linear methods such as Linear Regression(LM), Multiple Linear Regression (MLR), as well as from well-known machine learning (ML)based techniques such as Support Vector Machines (SVM), Random Forest (RF), and NeuralNetworks (NN). Aside from analyzing performance of well-known methods, we also verifiedthe efficiency of sophisticated methods of Ensemble Learning (EL) by combining a set ofdifferent base models into super learner meta-models. Although ML-based methods are morecomputationally costly than the linear models LM and MLR, computing resource constraintsin the data correction process should not be of concern. Such a process is a non-real-time taskwhich is carried out after data processing as a Cloud Service (CS) on the server side.
Table 5.
Related work on applying regression methods to improve weather sensing process.
Applications Regression MethodsReference Calibration Inference LM MLR NN SVM RF EL
Fang and Bate [32] (cid:88) (cid:88)
Kelley and Pardyjak [33] (cid:88) (cid:88)
Sharma et al. [34] (cid:88) (cid:88)
Zimmerman et al. [35] (cid:88) (cid:88)
Yamamoto et al. [36] (cid:88) (cid:88) (cid:88)
Cordero et al. [37] (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) *This work (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) * The method we proposed in this paper for intelligent sensor calibration and data correction.Label: Linear Model (LM), Multiple Linear Regression (MLR), Support Vector Machines (SVM),Random Forest (RF), Neural Networks (NN), Ensemble Learning (EL). ccepted for publication in MDPI Sensors Journal , , 0 24 of 33 In order to reduce the LCAWS residual, we implemented a machine learning pipelinein R language, as illustrated in Figure 8. In such a pipeline, there are four important steps.In step (1), the processed dataset of each sensor is split into two sets. The first set consistsof 60 % of the dataset and is used for training and validating the regression models. Thesecond set with the remaining 40 % of the dataset is used for testing the models fitted instep (1). We observed that the train–test dataset proportion of 60–40% could bring betterresults, while train sets larger than that have led to overfitting. In step (2), we submitted thetrain set to SuperLearner [38], which is a machine learning R package [39] for constructingsuper-learners in predictions based on 𝑘 -fold cross-validations from a set of candidate models.We set 𝑘 =
10 folds in our experiments. Then, learner 𝑙 ★ is the model or the set of weightedmodels that most minimizes the risk metric MSE. In step (3), the fitted model 𝑙 ★ is then usedto make out-of-fold predictions from the test set with the 40 % of the input dataset. Finally, instep (4), we apply the performance metrics of R2, MSE, and RMSE in order to compare theoutcomes ˆ 𝑦 predicted by the 𝑙 ★ model with the expected values 𝑦 in the test set. Processed datasetSplit datasetCompute metrics(R2, MSE, RMSE) Split the train setinto k folds forcross-validationTest set For each k block,train and predictusing base modelsLearner model l * OutcomepredictionsSuper Learner's Pipeline(4)(1) (3)(2)
Figure 8.
Machine learning regression pipeline we implemented in order to reduce the LCAWS residuals.Four main steps are accomplished: (1) dataset splitting; (2) model training and cross-validation;(3) predictions with the learner model; and (4) computing performance metrics.
Considering the processed dataset for the sensors 𝑆 = { AP, AT, RH, RG, WS, WD } , weevaluated the efficiency of a set of regression models 𝐿 = { LM, MLR, NN, SVM, RF, EL } froman amount of different experiments | 𝐸 | = biglasso , extraTrees , gam , glm , glm.interaction , ipredbagg , kernelKnn , ksvm , lm , loess , mean , nnet , nnls , polymars , randomForest , ranger , rpart , rpartPrune , speedglm , speedlm , step , step.forward , step.interaction , stepAIC , svm . .From the database DB with the processed parameters of both LCAWS and PWS stations,for each sensor 𝑠 ∈ 𝑆 , we selected the respective processed dataset DB 𝑠 and conducted100 experiments from it. In each experiment 𝑒 ∈ 𝐸 , the four steps showing in Figure 8 areaccomplished by the learning_pipeline procedure. Since a random seed is set for each ccepted for publication in MDPI Sensors Journal , , 0 25 of 33 experiment, different random samples of each weather parameters are generated and splitby keeping the 60–40% train–test proportion. When having the fitted model 𝑙 ★ from theSuperLearner’s pipeline for training and cross-validation, the predictions are accomplishedfrom the test set and performance metrics are computed. The output 𝑜 of each experimentconsists of the model 𝑙 ★ , the dataset 𝑑 , the predicted values ˆ 𝑦 , and the obtained performancemetrics 𝑚 . Regarding the outputs 𝑜 ∈ 𝑂 , we applied 𝑡 -tests to verify the significance of thetop-1 fittest model’s R2 coefficient against the other models. Thus, we identified whether thebest regression model was really able to perform differently than the others. Algorithm 3:
ML experiments.
Input: 𝑆 = {AP, AT, RH, RG, WS, WD}, the set of sensors. 𝐿 = {LM, MLR, NN, SVM, RF, EL}, the set of learning techniques. 𝐸 = {1, 2, × , 100}, set of experiments.DB, processed dataset containing the weather parameters. Output: 𝑂 , resulting set of multiple ML outcomes, 𝑜 = { 𝑙 ★ , 𝑑 , ˆ 𝑦 , 𝑚 } . program run_experiments( 𝑆 , 𝐿 , 𝐸 , DB) : 𝑂 = null foreach 𝑠 ∈ 𝑆 do D 𝑠 = subset(DB, 𝑠 ) foreach 𝑙 ∈ 𝐿 doforeach 𝑒 ∈ 𝐸 do 𝑜 = learning_pipeline(D 𝑠 , 𝑙 , 𝑒 )append( 𝑂 , 𝑜 ) endendendreturn 𝑂 procedure learning_pipeline(D 𝑠 , 𝑙 , 𝑒 ) : set_seed( 𝑒 ) 𝑑 = split_dataset(D 𝑠 , 60%, ’rand’) 𝑙 ★ = super_learner( 𝑑 train [ 𝑦 ] , 𝑑 train [ 𝑥 ] , 𝑙 ) ˆ 𝑦 = predict( 𝑙 ★ , 𝑑 test [ 𝑥 ] ) 𝑚 = compute_metrics( ˆ 𝑦 , 𝑑 test [ 𝑦 ] ) return 𝑜 = { 𝑙 ★ , 𝑑 , ˆ 𝑦 , 𝑚 } Table 6 presents the performance ranking based on the average (Avg.) and standarddeviations (SD) of the regression models’ R2 coefficients, as well as the the paired 𝑡 -tests. Thebaseline model RAW (highlighted in blue) presents the LCAWS performance results regardingits originally processed weather parameter. Except for RG parameters, the experimentalresults show significant improvement on the data correctness. When minimizing the residuals,the top-1 regression models allowed for maximizing the performance in R2 with statisticalsignificance in relation to the RAW baseline. Considering R2 and the respective p -values for AP,AT, and WS sensors, one can verify that there were no significant differences in the performanceof simple regression models (LM, MLR) and more sophisticated ML-based models (EL, SVM,RF). The EL model was the best one for the RH sensor, reaching p -value with high significance,differentiating it from the others. For the WD parameter, the ML-based models EL and RFhave performance with no statistical difference. For the RG sensor, it is irrelevant applyingor not with regression models to correct the RG parameter. This can be verified from the R2performance of the fittest model LM, which had no significant difference to the ML, RAW (i.e.,LCAWS raw data), and EL models.RG and WD sensors have inherent limitations of analog devices based on magnetic reedswitches, either by the device design or by the data sampling procedure. Unlike the otherparameters, rainy events and wind speed had no pattern. We expect to address these issues to ccepted for publication in MDPI Sensors Journal , , 0 26 of 33 improve the data correctness in further investigations, e.g., hardware device design, softwaredata collecting procedures, and/or improving regression methods. ccepted for publication in MDPI Sensors Journal , , 0 27 of 33 Table 6.
Ranking of different regression models and R2 significance (Top-1 model vs. others) fromresults of 100 experiments.
Sensor Model Rank R2 R2 SignificanceAvg. ( ± SD ) t -Value p -Value AP MLR 1 0.9991 ( ± × − )EL 2 0.9987 ( ± ± × − ) 74.85 **SVM 4 0.9871 ( ± − ± − ± ± ± ± ± ± − ± − ± ± ± ± − ± − ± − ± − ± ± − ± ± − ± ± ± − ± − ± ± ± − ± − ± − ± − ± ± − ± ± − ± − ± − ± − ± ± − ccepted for publication in MDPI Sensors Journal , , 0 28 of 33 Once the best regression models for each sensor are obtained, we carried out a lastexperiment to verify how data correction would be accomplished in reality. Instead of splittingthe processed parameters into randomized samples, we determined the 60–40% train-testproportion with chronological intervals, from 1 to 18 March and from 18 to 30 March 2019,respectively. Figure 9 shows the scatter plots (a) and (b) of the weather parameters withno corrections (RAW data) and the parameters corrected with the fittest regression models,respectively. Table 7 presents the results from performance metrics.As we previously discussed, we observed that regression models could not reduce theresiduals of the RG parameter. In this last experiment, the best model LM in the RG parametersled to a lower R2 with a performance degradation of − ˆ 𝑦 from the best regressionmodels could be fitted well to those expected 𝑦 in PWS. The best models allowed performanceimprovements in R2 of 1.9%, 5.6%, 6.8%, 68.0%, and 19.9% for the AP, AT, RH, WS, and WDparameters, respectively. In addition, the paired 𝑡 -tests applied in such weather parametersshowed p -values higher than the level of 0.05, i.e., there was no significant difference between ˆ 𝑦 and 𝑦 . These promising fitting results are seen in the time-series in Figure 10, where theblack lines of the ˆ 𝑦 values predicted by the best regression models nearly overlap the red lineof 𝑦 -values expected in PWS. Table 7.
Performance metrics and significance between LCAWS and PWS weather parameters regardingthe test set from the 18 to 30 March 2019.
Sensor Model R2 MSE RMSE Significance t -Value p -Value AP RAW 0.9789 0.2316 0.4813 1.6200 0.1048AT RAW 0.9352 0.9686 0.9842 2.7500 *RH RAW 0.9359 14.6893 3.8327 − − − − − From the experimental results we obtained, one can conclude that, by applying theregression model-based data correction approach, it was possible to calibrate the sensors AP,AT, RH, WS, and WD. Such an approach allowed: (1) minimizing the residuals from MSEand RMSE metrics and, hence, maximizing the performance R2 of these LCAWS sensors;(2) validating statistically these LCAWS sensors with no significant differences of its correctedweather parameters to the expected ones in PWS. With reduced residuals, the p -values above0.05 make the null hypothesis be accepted, i.e., the hypothesis that the weather stationsproduce equivalent weather parameters. In the case of the RG sensor, whether or not to applyregression models is indifferent. Nevertheless, the original results with processed data fromLCAWS show to be statistically similar to the PWS results. ccepted for publication in MDPI Sensors Journal , , 0 29 of 33 llll llllllllllllllllllllll l l lllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Model: RAWp−value = 0.1048R2 = 0.9789 lllllllllllllll l l l l lllllllllllllllllllll l ll l llllllllllllllllllllllll llllllllllllllllllll l llllllllllllllllllllll l l l l l l l lllllllllllllllll ll l l lllllllllllllllllll l l l l l l l llllllllllllllllll l l l l lll lllllllllllllllll l l l ll l l llllllllllllllllll ll l llllllllllllllllllll l l l l llll llllllllllllll l
Model: RAWp−value = 0.0062R2 = 0.9352 l l l ll llllllllllllllllllll l l l llllllllllllllllll ll llllllllllllllllllllllllll llllllllll lllllllll l lll llllllllllllllllllllllll l l llll lllllllllllllllll l l ll l llllllllllllllllllll l l l llllllllllllllllllll l l lllllll llllllllllllll l l l lllllllllllllllllll lllll llllllllllllllllllll lll l llllll lll l ll
Model: RAWp−value = 0.0107R2 = 0.9359 llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll ll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllll
Model: RAWp−value = 0.8449R2 = 0.9196 lllll lllllll llllll l l ll llll lllllllllll lll lllll l lllll llllllllll l l ll lll ll lllll lll llll lllll ll lllll ll ll lllllllllllll l l llllll ll l l lllllllllllllll llll ll l lllll lllllllll l llllll ll ll lllll llllll ll ll llllll l l llllll lll l ll lll l lll lll l l llll l lllll lllll ll lll l l lll llllllllllllll l ll lll l l lllllllll lllll
Model: RAWp−value = 0R2 = 0.3109 ll llll l l lllll l ll l llllllllll ll lll lllllll l llll llllll l llllllllll l ll lllll lllll lll ll l l llll ll llllll llll lllllllll lllllllllll l ll lllll ll ll lllllll l lll ll l llll lll lllllllll lll lllllllllll lllllll lll llll llll lllll ll lllllllll llll llll lllll lllllll lll l llll l llllll llll l lllllll l lllll l llllll lllllll ll ll
Model: RAWp−value = 0.4627R2 = 0.5467
AP (mbar) AT (C) RH (%) RG (mm/h) WS (m/s) WD (degree)946 950 954 958 20 25 30 40 60 80 100 0 2 4 6 0.0 0.5 1.0 1.5 0 100 200 3000100200300012340246406080100202530946950954958 Low−Cost Automatic Weather Station (LCAWS) P r o f e ss i ona l W ea t he r S t a t i on ( P W S ) ( a ) LCAWS weather parameters processed from the raw data. llll llllllllllllllllllllll l l lllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Model: MLRp−value = 0.6985R2 = 0.9981 lllllllllllllll l l l l lllllllllllllllllllll l ll lllllllllllllllllllllllll llllllllllllllllllll l llllllllllllllllllllll l l l l l l l lllllllllllllllll ll l l lllllllllllllllllllll l l l l l llllllllllllllllll l l l l lll llllllllllllllllll l l ll l l llllllllllllllllll ll l llllllllllllllllllll l l l l llll llllllllllllll l
Model: MLRp−value = 0.674R2 = 0.9912 l l l ll llllllllllllllllllll l l l llllllllllllllllll ll llllllllllllllllllllllllll llllllllll lllllllll l lll llllllllllllllllllllllll l l llll lllllllllllllllll l l ll l llllllllllllllllllll l l l llllllllllllllllllll l l lllllllllllllllllllll l l l lllllllllllllllllll lllll llllllllllllllllllll lll l lllllllll l ll
Model: ELp−value = 0.5571R2 = 0.9822 llllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll ll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllll
Model: LMp−value = 0.3945R2 = 0.8801 lllll lllllll llllll l l ll llll lllllllllll lll lllll l lllll llllllllll l l ll lll ll llllllll llll lllll ll lllll ll ll lllllllllllll l l llllll ll l llllllllllllll llllll ll l lllll lllllllll l llllll ll ll lllll llllll ll ll llllll l l llllll lll l ll llll lll lll l l llll l lllll lllll ll lll l l lll llllllllllllll l ll lll l l lllllllll lllll
Model: ELp−value = 0.4773R2 = 0.972 ll llll ll llllll lll llllllllll lllll lllllll l llll llllll l llllllllll lllll lll llllllllll l lllll ll llllllllll llllllllllll llllllll l ll lll ll llll lllllll l lll ll l ll lllll lllllllll lll lllllllllll lllllll lll llll l lll l llll lllllllllll llll lll l lllll ll llll l llll llll l llllll llll lllllllll lllll l lllllllllllll ll ll
Model: ELp−value = 0.7097R2 = 0.6827
AP (mbar) AT (C) RH (%) RG (mm/h) WS (m/s) WD (degree)946 950 954 20 25 30 40 60 80 100 0 2 4 6 8 0 1 2 3 4 100 200 3000100200300012340246406080100202530946950954958 ML−based Regression Models (LCAWS) P r o f e ss i ona l W ea t he r S t a t i on ( P W S ) ( b ) Processed LCAWS weather parameters corrected through the fittest regression models.
Figure 9.
Scatter plots between LCAWS and PWS weather parameters, regarding the test set from 18 to 30 March 2019, with different sensors: Air Pressure (AP), Air Temperature (AT),Relative Humidity (RH), Rain Gauge (RG), Wind Speed (WS), and Wind Direction (WS). ccepted for publication in MDPI Sensors Journal , , 0 30 of 33 AP ( m ba r) A T ( C ) RH ( % ) R G ( mm / h ) W S ( m / s ) W D ( deg r ee )
18 19 20 21 22 23 24 25 26 27 28 2994595095520253040608010002468012340100200300 Days of observation in March, 2019 (test dataset)PWS, LCAWS, ML−based Regression Model
Figure 10.
Time-series of weather parameters from PWS, LCAWS, regression models, regarding the testset from 18 to 30 March 2019.
7. Conclusions
Weather stations play a key role for the natural disaster monitoring, being the mainsource of accurate, reliable, and in situ weather data. To support risk management and reducethe impacts of natural disasters in Brazil, Cemaden urges to provide large-scale weathermonitoring from an observational network of more than five thousands of data collectionplatforms (DCPs) spread across risk areas in thousands of cities of the Brazilian territory. Themain vulnerabilities of DCPs are not linked to the hardware robustness, but mostly to theapplied technology that is provided only from high-cost professional weather instrumentation.While the expensive DCP’s components lead to high costs of maintenance for both preventivelyand replacing failed components, the vulnerabilities in the observational network are mitigatedultimately from short-term periodic maintenance. In this context, the maintenance of DCPs isextremely costly for a vast country with a need for thousands of DCPs such as Cemaden’sobservational network, mostly under limited human and financial resources.As an initiative to address cost reduction in maintenance process of natural disastermonitoring, we propose to apply COTS-based IoT technology in order to reduce the cost ofweather instrumentation from tens of thousands of dollars to a few hundred. To this end, inthis paper, we present a comprehensive material on the design, implementation, validation,and intelligent sensor calibration of a low-cost automatic weather station (LCAWS) entirely ccepted for publication in MDPI Sensors Journal , , 0 31 of 33 developed from commercial-grade parts and open-source IoT technologies. From a robustmethodology, we showed that the proposed LCAWS was able to provide weather data asreliably as a professional weather station (PWS) of reference for natural disaster monitoring. As a proof-of-concept, LCAWS is an initial work to demonstrate the feasibility of low-costalternatives to reduce maintenance costs of the Brazilian weather observational network.However, along the LCAWS validation process, we identified the following hardware andsoftware limitations: • Datalogger enclosure protection. The datalogger housing, rated IP55, is not hermeticallysealed. Although the silica gel was utilized to keep the interior of the housing dry, smallamounts of water condensation were observed on the interior walls of the housing whenthe silica gel became saturated. It is not known if this water came from the outside orfrom battery acid evaporation. Moreover, high temperatures were frequently observedinside the datalogger housing during sunny days, up to a maximum of 47 degrees Celsius.Proposed improvements to be investigated are utilizing an enclosure with superiorthermal insulation and shielding the enclosure from direct sunlight. • Short battery life-cycle. While the power consumption to run RF modules (GPRS, GPS)made it necessary to switch the 10 Wp solar panel to a 55 Wp one, the battery lastedapproximately six months before it could not hold enough charge to last a whole night.This short battery life may be also due to the observed high temperatures inside thehousing. • Rain gauge faults. The original rain gauge design was prone to false pulses when highwinds caused the tipping bucket to vibrate. This was successfully corrected by removingthe single reed switch, which was triggered when the bucket passed the middle positionand replacing it with two reed switches. Each reed switch was then triggered when thebucket sat at the end (tipped) position. • AWS’ inconsistencies. A supposed software error causes the Arduino to freeze, requiringa manual hard reset. This problem is recurrent, usually happening once a month, and isstill under investigation. While the AWS implementation (Figure 3) has validated LCAWS,its coding requires improvements, including refactoring.In this context, further investigations are required to address LCAWS’s hardware andsoftware limitations above mentioned and, then, evolve LCAWS to be deployed on Cemaden’sobservational network and integrated into its natural disaster monitoring pipeline.
Author Contributions:
Conceptualization, all authors; methodology, all authors; software, G.F.L.R.B andB.Y.L.K; validation, G.F.L.R.B and B.Y.L.K; formal analysis, B.Y.L.K; investigation, all authors; resources,all authors; data curation, B.Y.L.K; writing—original draft preparation, all authors; writing—review andediting, all authors; visualization, B.Y.L.K; supervision, B.Y.L.K; funding acquisition, all authors. Allauthors have read and agreed to the published version of the manuscript.
Funding:
This research was funded by São Paulo Research Foundation (FAPESP), Grant Nos.
Institutional Review Board Statement:
Not applicable.
Informed Consent Statement:
Not applicable.
Data Availability Statement:
The data presented in this study are available in [31].
Acknowledgments:
We thank Antonio Carlos Varela Saraiva (UNESP, Brazil) who allowed the validationof LCAWS to be accomplished together with his research group’s professional weather station (PWS)deployed at PqTec [28]. We also thank Fabio Augusto Faria (UNIFESP, Brazil) for the suggestions about ccepted for publication in MDPI Sensors Journal , , 0 32 of 33 machine-learning best practices that helped us to implement a robust method for intelligent sensorcalibration and data correction. Conflicts of Interest:
The authors declare no conflict of interest.
References
1. Sawant, S.; Durbha, S.S.; Jagarlapudi, A. Interoperable agro-meteorological observation and analysis platform for precision agriculture:A case study in citrus crop water requirement estimation.
Comput. Electron. Agric. , , 175–187.2. Winkler, M.; Street, M.; Tuchs, K.D.; Wrona, K. Wireless sensor networks for military purposes. In Autonomous Sensor Networks ;Springer: Berlin, Germany, 2012; pp. 365–394.3. Finger, R.; Lehmann, N. Modeling the sensitivity of outdoor recreation activities to climate change.
Clim. Res. , , 229–236.4. Banholzer, S.; Kossin, J.; Donner, S. The impact of climate change on natural disasters. In Reducing Disaster: Early Warning Systems forClimate Change ; Springer: Berlin, Germany, 2014; pp. 21–49.5. Van Aalst, M.K. The impacts of climate change on the risk of natural disasters.
Disasters , , 5–18.6. Boustan, L.P.; Kahn, M.E.; Rhode, P.W.; Yanguas, M.L. The effect of natural disasters on economic activity in US counties: A century ofdata. J. Urban Econ. , , 103257.7. Debortoli, N.S.; Camarinha, P.I.M.; Marengo, J.A.; Rodrigues, R.R. An index of Brazil’s vulnerability to expected increases in naturalflash flooding and landslide disasters in the context of climate change. Nat. Hazards , , 557–582.8. Degrossi, L.C.; de Albuquerque, J.P.; Fava, M.C.; Mendiondo, E.M. Flood Citizen Observatory: a crowdsourcing-based approachfor flood risk management in Brazil. In Proceedings of the 26th International Conference on Software Engineering and KnowledgeEngineering (SEKE 2014), Vancouver, BC, Canada, 1–3 July 2014; pp. 570–575.9. Fava, M.C.; Abe, N.; Restrepo-Estrada, C.; Kimura, B.Y.L.; Mendiondo, E.M. Flood modelling using synthesised citizen science urbanstreamflow observations. J. Flood Risk Manag. , , e12498.10. de Vos, L.W.; Leijnse, H.; Overeem, A.; Uijlenhoet, R. Quality control for crowdsourced personal weather stations to enable operationalrainfall monitoring. Geophys. Res. Lett. , , 8820–8829.11. Sabharwal, N.; Kumar, R.; Thakur, A.; Sharma, J. A Low Cost Zigbee Basedautomatic Wireless Weather Station With Gui and WebHosting Facility. Int. J. Electr. Electron. Eng. , , 258–263.12. Saini, H.; Thakur, A.; Ahuja, S.; Sabharwal, N.; Kumar, N. Arduino based automatic wireless weather station with remote graphicalapplication and alerts. In Proceedings of the 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN),Noida, India, 11–12 February 2016; pp. 605–609.13. Lockridge, G.; Dzwonkowski, B.; Nelson, R.; Powers, S. Development of a low-cost arduino-based sonde for coastal applications. Sensors , , 528.14. Strigaro, D.; Cannata, M.; Antonovic, M. Boosting a weather monitoring system in low income economies using open andnon-conventional systems: Data quality analysis. Sensors , , 1185.15. Benghanem, M. Measurement of meteorological data based on wireless data acquisition system monitoring. Appl. Energy , Nat. Hazards Earth Syst.Sci. , IEEE Wirel. Commun. , , 60–67.24. Lauridsen, M.; Nguyen, H.; Vejlgaard, B.; Kovacs, I.Z.; Mogensen, P.; Sorensen, M. Coverage Comparison of GPRS, NB-IoT, LoRa, andSigFox in a 7800 𝑘𝑚 Area. In Proceedings of the 2017 IEEE 85th Vehicular Technology Conference (VTC Spring), Sydney, Australia,4–7 June 2017; pp. 1–5. ccepted for publication in MDPI Sensors Journal , , 0 33 of 33
25. Augustin, A.; Yi, J.; Clausen, T.; Townsley, W.M. A study of LoRa: Long range & low power networks for the internet of things.
Sensors , , 1466.26. Zuniga, J.C.; Ponsard, B. Sigfox System Description ; LPWAN@ IETF97 Nov. 14th; Seoul, South Korea, 2016; Volume 25.27. Schlienz, J.; Raddino, D. Narrowband internet of things whitepaper. In
White Paper, Rohde&Schwarz: Munich, Germany
Proceedings of the 2017International Conference on Embedded Wireless Systems and Networks , Uppsala, Sweden, 20—22 February 2017; pp. 1–11.33. Kelley, J.; Pardyjak, E.R. Using Neural Networks to Estimate Site-Specific Crop Evapotranspiration with Low-Cost Sensors.
Agronomy , , 108.34. Sharma, N.; Sharma, P.; Irwin, D.; Shenoy, P. Predicting solar generation from weather forecasts using machine learning. InProceedings of the 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm), Brussels, Belgium, 17–20October 2011; pp. 528–533.35. Zimmerman, N.; Presto, A.A.; Kumar, S.P.; Gu, J.; Hauryliuk, A.; Robinson, E.S.; Robinson, A.L.; Subramanian, R. A machine learningcalibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech. , , 291–31.36. Yamamoto, K.; Togami, T.; Yamaguchi, N.; Ninomiya, S. Machine learning-based calibration of low-cost air temperature sensors usingenvironmental data. Sensors , , 1290.37. Cordero, J.M.; Borge, R.; Narros, A. Using statistical methods to carry out in field calibrations of low cost air quality sensors. Sens.Actuators B Chem. , , 245–254.38. Van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super learner. Stat. Appl. Genet. Mol. Biol. ,
6, 1–21, DOI: https://doi.org/10.2202/1544-6115.13096, 1–21, DOI: https://doi.org/10.2202/1544-6115.1309