[PDF] Personalized Prediction of Vehicle Energy Consumption based on Participatory Sensing

Abstract

The advent of abundant on-board sensors and electronic devices in vehicles populates the paradigm of participatory sensing to harness crowd-sourced data gathering for intelligent transportation applications, such as distance-to-empty prediction and eco-routing. While participatory sensing can provide diverse driving data, there lacks a systematic study of effective utilization of the data for personalized prediction. There are considerable challenges on how to interpolate the missing data from a sparse dataset, which often arises from participatory sensing. This paper presents and compares various approaches for personalized vehicle energy consumption prediction, including a blackbox framework that identifies driver/vehicle/environment-dependent factors and a collaborative filtering approach based on matrix factorization. Furthermore, a case study of distance-to-empty prediction for electric vehicles by participatory sensing data is conducted and evaluated empirically, which shows that our approaches can significantly improve the prediction accuracy.

Full PDF

11 Personalized Prediction of Vehicle EnergyConsumption based on Participatory Sensing

Chien-Ming Tseng and Chi-Kin Chau

Abstract —The advent of abundant on-board sensors and elec-tronic devices in vehicles populates the paradigm of participatorysensing to harness crowd-sourced data gathering for intelligenttransportation applications, such as distance-to-empty predictionand eco-routing. While participatory sensing can provide diversedriving data, there lacks a systematic study of effective utilizationof the data for personalized prediction. There are considerablechallenges on how to interpolate the missing data from a sparsedataset, which often arises from participatory sensing. This paperpresents and compares various approaches for personalizedvehicle energy consumption prediction, including a blackboxframework that identiﬁes driver/vehicle/environment-dependentfactors and a collaborative ﬁltering approach based on matrixfactorization. Furthermore, a case study of distance-to-emptyprediction for electric vehicles by participatory sensing datais conducted and evaluated empirically, which shows that ourapproaches can signiﬁcantly improve the prediction accuracy.

Index Terms —Participatory sensing, vehicle energy consump-tion, distance-to-empty prediction, data mining

I. I

NTRODUCTION

Participatory sensing is an emerging paradigm of crowd-sourced data collection and knowledge discovery, which hasbeen applied in diverse applications of pervasive and mobilecomputing systems [1]. The basic concept is that a groupof users contribute their personal data (possibly, voluntarily)to a third-party data repository, in exchange for the usefulknowledge extracted from the collective data, which is thenincorporated in personalized applications of individual users.Vehicles are becoming a vital platform for participatorysensing. First, there are extensive deployments of on-boardsensors and in-vehicle information systems, equipped withnetwork connectivity and computing power, acting as effectiveinformation collection systems. Second, the wide availabilityof electronic devices and smartphones carried by passengerscan extend the computing and sensing abilities of vehicles.Third, there are abundant off-the-shelf and after-market au-tomotive accessories for gathering driving data and vehicleinformation. Notably, participatory sensing has been appliedin several existing intelligent transportation applications (e.g.,trafﬁc status updates in Google Map and Waze).Furthermore, new intelligent transportation applications canbe enhanced by participatory sensing. One of the criticalapplications is the prediction of distance-to-empty (DTE) -

Chien-Ming Tseng and Chi-Kin Chau are with the Department ofEECS, Masdar Institute of Science and Technology, UAE (e-mail: { ctseng,ckchau } @masdar.ac.ae).This paper appears in IEEE Transactions on Intelligent TransportationSystems (DOI:10.1109/TITS.2017.2672880). the distance an electric vehicle (EV) or internal-combustion-engine (ICE) vehicle can reach before its energy/fuel isexhausted. DTE is determined by a variety of factors, suchas driving behavior, terrain, types of road, trafﬁc, and vehiclespeciﬁcations. The conventional approach of DTE predictionemployed by car manufacturers is based on the projection ofpast average vehicle energy efﬁciency of individual drivers.Such an approach is often perceived to be inaccurate. How-ever, if there is further knowledge about the vehicle, drivingbehavior and the route to travel, future energy efﬁciency canbe estimated with higher accuracy.The availability of participatory sensing data is able toimprove the accuracy of DTE prediction by exploiting thehistorical data from other drivers. Conceptually, one canidentify the characteristics pertaining to speciﬁc driver, vehicleor environment. Then, one can harness the measurementsfrom similar drivers, vehicles or environments to assist theprediction. In particular, there are several areas of applications:1) Vehicle Centric Applications : Range anxiety is criticalfor EVs. Since there are far more ICE vehicles on theroad than EVs, one can harness the data collected fromICE vehicles to improve the DTE prediction for EVs.2)

Driver Centric Applications : With diverse data collectedfrom various drivers, one can compare the driving be-havior among drivers. Hence, one can classify drivingbehavior and provide driving recommendations.3)

Environment Centric Applications : Eco-routing or greentelematics can be provided by comparing different routesaccording to energy/fuel consumption prediction.This paper studies a framework of participatory sensingwith an integrated platform of appropriate knowledge dis-covery and incorporation mechanisms for personalized vehi-cle/driver/environment centric applications (depicted in Fig. 1).Fig. 1: An integrated platform of participatory sensing forpersonalized applications.While participatory sensing can provide diverse drivingdata, there are considerable challenges of harnessing partic-ipatory sensing. First, participatory sensing dataset is often a r X i v : . [ c s . H C ] F e b sparse and skewed, which does not cover sufﬁcient combi-nations of vehicle/driver/environment. This calls for an effec-tive approach to interpolate the missing data from a sparsedataset. Second, the dimensionality of dataset may be largedue to different combinations of various drivers, vehicles andenvironments. To enable data analytics, an efﬁcient method isdesirable to extract the correlations within data.This paper explores several approaches of utilizing partici-patory sensing data for personalized applications:1) Comparison with the Average : One can obtain the aver-age data values (e.g., average speed, stopping duration)from a large dataset for a speciﬁc environment. Then thedeviation of individual drivers is compensated from theaverage data values in personalized applications.2)

Collaborative Filtering : A domain-free data miningtechnique is to analyze the relationships and interde-pendencies within a dataset and to identify a smaller setof latent factors that can characterize the observed data.Based on latent factors, one can interpolate the missingdata in the dataset. In particular, matrix factorization isa popular solution to realize collaborative ﬁltering.3)

Similarity Matching : Using a known model of vehicleenergy consumption, one can compare the participatorysensing data and ﬁnd the most similar instances fromthe available data to estimate the required values.The notion of average data values has been utilized inprevious papers [2], [3], which has a disadvantage of requiringa large dataset. Collaborative ﬁltering [4], [5] is a generaltechnique in data mining without leveraging the detailedknowledge of underlying model. However, the presence of spe-ciﬁc knowledge in vehicle energy consumption can potentiallyimprove the accuracy and effectiveness.This paper presents several viable approaches of utilizingparticipatory sensing data for personalized prediction of vehi-cle energy consumption. In particular, a blackbox frameworkis presented to effectively identify driver/vehicle/environmentdependent factors from participatory sensing data for per-sonalized prediction. To demonstrate the effectiveness of ourapproaches, a case study of distance-to-empty prediction forEVs based on the participatory sensing data is conducted andvalidated empirically, which shows that our approaches cansigniﬁcantly improve the prediction accuracy.

Outline of Paper:

The related work is ﬁrst presented inSec. II. The methodologies of personalized prediction ap-proaches are presented in Secs. III-IV. The empirical eval-uations of our approaches are given in Sec. V. A case studythat utilizes our results is discussed in Sec. VI.II. R

ELATED W ORK

A. Vehicle Energy Consumption Models

Modeling vehicle energy consumption has been the subjectof a number of research papers. One popular method is themodel-based approach, which is based on vehicle dynamicsto model the consumption behavior of ICE vehicles [6] andEVs [7]. Energy consumption estimation can use a blackboxapproach. For example, a statistical approach using regressionmodel to estimate the energy consumption of ICE vehicles is presented in [8]. The energy consumption rate of differentvehicles and roads can also be clustered to characterize theenergy consumption of general vehicle and road types [3].

B. Data Collection of Vehicle Energy Consumption

The accuracy of energy consumption prediction can beenhanced by collecting more information. Two crucial fac-tors of energy consumption prediction are the future speedproﬁles and future environmental factors (e.g., temperature,wind speed or route grade), which may be highly dynamicand difﬁcult to predict. One method to estimate the futurespeed proﬁles is to utilize Markov chain [9]. Also, one candeploy sensor networks, by which stationary measurements atspeciﬁc locations, such as trafﬁc, average speed, speed limitand route grade can be measured. There are a number of papersfocusing on utilizing such information [3]. However, the trafﬁcdata in these papers is usually static, which may have a largedeviation in dynamic trafﬁc. A study that integrates the real-time trafﬁc sensor data to predict the energy consumption andemission of ICE vehicles is presented in [10]. One can alsoobtain the estimated information from social networks andparticipatory sensing. Participatory sensing can provide mobilemeasurements and good geographic penetrations [11]. Forexample, [12] shows that the estimation of stochastic effectswhich impact the travel velocity and acceleration proﬁles canbe crowd-sourced to identify trafﬁc congestion. Our previouswork employs participatory sensing for DTE prediction [13],[14], which is extended in this paper.

C. Applications of Vehicle Energy Consumption

The integration of energy consumption prediction and datacollection enables many applications. One application is theestimation of DTE, based on the prediction of vehicle energyefﬁciency (i.e., energy intensity), which is employed in pro-duction vehicles [15]. DTE can be estimated by measuring themean energy consumption over short and long distances [16].To account for the deviation between the historical and futureenergy intensity, a regression model can be used to predictthe future energy intensity given future route information [17].Route features from sensor data can be clustered to identify thedriving pattern for EV range estimation [18]. Another applica-tion is eco-routing. A system based on road characteristics andcurrent prevailing trafﬁc conditions is presented in [19]. It canprovide users a more economic and safer route with reasonablespeed instead of only driving at a lower speed. Another studyutilizing average participatory sensing data and static trafﬁc in-formation for eco-routing system of ICE vehicles is presentedin [2]. Route-type based energy consumption prediction canbe implemented using OpenStreetMap (OSM) data. There aresome studies using the OSM data to predict the EV range[20]. A cloud based prediction system considers the deviationbetween the mean energy consumption and that of differentcondition (e.g., trafﬁc congestion or driving behavior) in [21].The route-type based energy consumption model requires acomplete map database including speed limit, route type andtrafﬁc information, which may not be available everywhere.

This work differentiates from the previous work in sev-eral aspects: (1) We compare various personalized predic-tion approaches of vehicle energy consumption. (2) Wepresent a novel blackbox framework to extract driver/vehicle/environment-dependent factors. (3) We conduct a case studyof DTE prediction for EVs using different approaches.III. M

ETHODOLOGY AND B ACKGROUND

This section presents the relevant methodology and back-ground related to vehicle energy consumption models.

A. Areas of Factors

While there are many factors to determine vehicle energyconsumption, they can be classiﬁed by three broad areas: • Driver : The driver who controls the vehicle has a directimpact on the vehicle movement. Different drivers exhibitdifferent preferences for stop/start and acceleration, ag-gression in various scenarios, propensity for hypermiling,etc. Psychological and behavioral traits of drivers alsoaffect vehicle energy efﬁciency. • Vehicle : Different types of vehicles consume energydifferently. ICE vehicles are characterized by the enginetypes and gear shifts, whereas hybrid and EVs are af-fected by battery performance and regenerative braking.The sizes and weights of vehicles often determine theefﬁciency of kinetic energy conversion, so SUVs andtrucks are usually less energy-efﬁcient than sedan andcompact vehicles. • Environment : The environmental factors include trafﬁcand roads. Trafﬁc for a road segment depends on aplethora of factors, including time-of-day, day-of-year,special events, which may follow a certain pattern. Thetypes of roads also affect drivers’ behavior differently,which can be divided into three main categories: smallpublic or private roads with urban trafﬁc, lower capacity“urban” highways, and higher capacity freeways. Otherenvironmental factors, such as road grades and weathertypes, can also be considered.The historical data of vehicle speed proﬁles can be identiﬁedby a combination of (driver, vehicle, environment), referredas a data point . This paper aims to predict the energy con-sumption for each data point. Through participatory sensing, adataset of measured energy consumption for a relatively smallnumber of data points are collected. This paper addresses thechallenge of data interpolation with good accuracy.

B. Types of Models

There are two main types of energy consumption models: • White-box Model : A straightforward approach is toemploy a white-box microscopic behavior model of eachvehicle that comprehensively characterizes the engineperformance, vehicle mechanics, battery systems, etc.To incorporate trafﬁc information, one can rely on amacroscopic trafﬁc database collected from a networkof loop sensors along speciﬁc road segments. However,such a white-box vehicle model requires a large amount of data for calibration and detailed knowledge speciﬁcto a particular vehicle. Also, the availability and accessof accurate trafﬁc information is often limited to certainauthorized parties only. • Blackbox Model : A blackbox approach is more desirablethat requires minimal knowledge of vehicle model withonly a small set of measurable variables and parameters.The variables and parameters depend on the combina-tions of (driver, vehicle, environment). In the subsequentsections, the variables and parameters obtained in theblackbox model will be utilized for collaborative ﬁlteringand similarity matching.

C. Energy Consumption Model

This section describes a linear blackbox model of vehicleenergy consumption that has been used extensively in the liter-ature [2], [3], [8], [13], [22]. Denote a driver by D , a vehiclemodel by V , and a particular environment (e.g., a segmentof route and time-of-day) by R . Each energy consumption isrepresented by a numerical value E D , V , R , indexed by the tuple ( D , V , R ) . All the entries of energy consumption values forma -dimensional tensor, denoted by [ E D , V , R ] .While there are sophisticated approaches of estimating themoving vehicle energy consumption by white-box microscopicbehavior models [7], [6], [9], [18], [23], [24], these modelsare rather difﬁcult to implement. Many parameters are re-quired, for example, engine efﬁciency, transmission efﬁciency,regenerative braking efﬁciency, etc. However, in practice, theseparameters are hard to obtain. Therefore, this paper utilizes ablackbox approach without the detailed knowledge of vehiclemechanics. This approach maximizes the applicability for awide range of scenarios arising from participatory sensing.The total energy consumption E of driver D with vehiclemodel V in a particular environment R is given by: E D , V , R = E mv D , V , R + E id D , V , R (1)where E mv D , V , R is the moving vehicle energy consumption and E id D , V , R is the idle vehicle energy consumption.

1) Moving Vehicle Energy Consumption:

With respect to aparticular combination of ( D , V , R ) , the moving vehicle energyconsumption E mv has unit in liter or kWh. Next, the subscript D , V , R is dropped for brevity.In this paper, E mv (denoted by ˆ E mv ) is estimated by a linearequation of several measurable variables from vehicles : ˆ E mv =  α v, α v, ... α v,r  T  vv ... v r  +  (cid:126)α d, (cid:126)α d, ... (cid:126)α d,k  T  (cid:126)d(cid:126)d ... (cid:126)d k  +  (cid:126)α a, (cid:126)α a, ... (cid:126)α a,m  T  (cid:126)a(cid:126)a ... (cid:126)a m  +  α g α (cid:96) c  T  g(cid:96)  (2)where Some of the variables are selected based on [25], which analyzed morethan 20 thousand data points from 45 drivers to identify the most signiﬁcantfactors of fuel consumption and emission. (a) Measured and interpolated data points in asparse dataset. (b) Factors and dependence. (c) Interpolation of data points by substitution offactors from the most similar measured data points.

Fig. 2: Illustrations for interpolation of missing data points. • v is the continuous average speed (i.e., the average speedwithout idling). The higher powers of v like v , ..., v r are also considered. • (cid:126)d = ( τ d , µ d , σ d ) is the deceleration tuple: – τ d is the total duration of deceleration. – µ d is the mean deceleration (i.e., the sum of decel-eration values divided by the deceleration duration). – σ d is the standard deviation of deceleration.Denote the higher powers of components in the deceler-ation tuple by (cid:126)d k = ( τ kd , µ kd , σ kd ) . • (cid:126)a is the acceleration tuple (similar to (cid:126)d ). • g is the mean absolute value of gyroscope along themoving direction. • (cid:96) is the auxiliary load of idling, which is the baselinemeasurement when the vehicle is not moving. • c is a normalization constant. • α v (cid:44) ( α v, , ..., α v,r ) , α d (cid:44) ( α d, , ..., α d,k ) , α a (cid:44) ( α a, ,..., α a,k ) , α g , α (cid:96) are the corresponding coefﬁcients.Note that the coefﬁcients α d (cid:44) ( α d, , ..., α d,k ) can effectivelycapture the regenerated energy of EVs.

2) Idle Vehicle Energy Consumption:

Similarly, a blackbox approach is used to estimate the idlevehicle energy consumption. The subscript D , V , R is dropped forbrevity. With respect to a particular combination of ( D , V , R ) ,the idle vehicle energy consumption E id (denoted by ˆ E id ) isestimated by a linear equation: ˆ E id = β µ(cid:96) + β ω (3)where • µ is the total idle duration. • (cid:96) is the auxiliary load of idling. • ω is the outdoor temperature. • β , β are the coefﬁcients.The parameters v, (cid:96) can be obtained from standard OBDdata inquiry from vehicles, whereas (cid:126)d, (cid:126)a, µ can be computedfrom speed proﬁles, g can be obtained from smartphones, and ω can be obtained from online weather data. D. Estimation of Coefﬁcients

The coefﬁcients ( α v , (cid:126)α d , (cid:126)α a , α g , α (cid:96) , c, β , β ) in Eqns. (2)-(3) can be estimated by the standard regression method, if Sec. V will empirically determine the proper powers of parameters. sufﬁcient measured data ( v, (cid:126)d, (cid:126)a, g, (cid:96), µ, ω ) and the respectiveenergy consumption data ( ˆ E mv , ˆ E id ) are provided. Assumethat each driver-vehicle pair ( D , V ) has collected sufﬁcienthistorical personal driving data, and hence, the coefﬁcients canbe estimated for the respective environment R . One notableadvantage of regression method is that it is less susceptible torandom noise, which can arise from various sources (e.g., dueto time synchronization in data sampling, mechanic damping,inaccurate measurements).IV. I NTERPOLATING P ARTICIPATORY S ENSING D ATA

Given a dataset of driving data collected by participatorysensing, a data point can be visualized as a point in a -dimensional Euclidean space, indexed by ( D , V , R ) . Theparticipatory sensing dataset is usually sparse, consisting ofa skewed and clustered distribution of data points. In order topredict the vehicle energy consumption for the data pointsthat are not collected from participatory sensing, we seekto interpolate the missing data points to cover the space ofdataset. An illustration is depicted in Fig. 2a.Next, three major data interpolation approaches by similar-ity matching, matrix factorization, and comparison with theaverage are presented. A. Similarity Matching

Similarity matching is related to neighborhood-based col-laborative ﬁltering. The three areas of factors (i.e., driver,vehicle, and environment) that determine the vehicle energyconsumption are not necessarily exclusive. There are factorsthat can belong to multiple aspects. For example, the speed ofa vehicle depends on both driver and environment. Abstractly,the factors can be visualized by a Venn diagram (see Fig. 2b).After characterizing the factors and their dependence, theinterpolation of missing data points can be attained by suitablesubstitution of factors from the most similar measured datapoints. For example, see Fig. 2c for an illustration. Afterobtaining the measured data for ( D , V , R ) and ( D , V , R ) ,we aim to estimate the energy consumption for ( D , V , R ) .If D is similar to D and V is similar to V , then one canreplace the factors that depend on R in ( D , V , R ) by thosedepend on R in ( D , V , R ) .The energy consumption model in Eqns. (2)-(3) providesa convenient way to extract the factors of driver, vehicle,and environment dependence. In Table I, the dependence of each parameter is heuristically assigned based on the majorobservable impacts from the driver, vehicle or environment. Driver- Vehicle- Environment-dependent dependent dependent v, (cid:126)d,(cid:126)a, g , (cid:96) (cid:88) (cid:88) µ, ω (cid:88) α v , (cid:126)α d , (cid:126)α a , α g (cid:88) α (cid:96) , c (cid:88) β , β (cid:88) (cid:88) TABLE I: Dependence of parameters and coefﬁcients.For the coefﬁcients, it is assumed that their dependenceis complementary to that of the respective parameters. Forexample, the average speed v is more likely affected by thedriver and environment, while to a less extent by the typeof vehicle. Hence, coefﬁcient α v is considered to be vehicle-dependent, such that the product α v v will be speciﬁc to aparticular tuple ( D , V , R ) . The dependence of coefﬁcients willbe empirically validated in Sec. V.The interpolation of missing data points can be attainedby the substitution of parameters and coefﬁcients in thevehicle energy consumption model. Consider an example inFig. 2c. Let the parameters and coefﬁcients for ( D1 , V1 , R1 ) be ( α v , (cid:126)α d , (cid:126)α a , α g , α (cid:96) , c, β , β ) and ( v, (cid:126)d, (cid:126)a, g, (cid:96), µ, ω ), andthose for ( D2 , V2 , R2 ) be ( α (cid:48) v , (cid:126)α (cid:48) d , (cid:126)α (cid:48) a , α (cid:48) g , α (cid:48) (cid:96) , c (cid:48) , β (cid:48) , β (cid:48) )and ( v (cid:48) , (cid:126)d (cid:48) , (cid:126)a (cid:48) , g (cid:48) , (cid:96) (cid:48) , µ (cid:48) , ω (cid:48) ). To estimate the energy con-sumption of ( D1 , V1 , R2 ) , ( α v , (cid:126)α d , (cid:126)α a , α g , α (cid:96) , c, β , β ) and( v (cid:48) , (cid:126)d (cid:48) , (cid:126)a (cid:48) , g (cid:48) , (cid:96), µ (cid:48) , ω ) can be used in the vehicle energy con-sumption model.To determine the similarity among drivers and vehicles, twoapproaches of similarity matching by speed proﬁle matchingand driving habit matching are presented next.

1) Speed Proﬁle Matching:

One can characterize the similarity between a pair ( D , V ) and ( D (cid:48) , V (cid:48) ) under the same environment R by comparingthe respective speed proﬁles (i.e., the plots of speed againsttraveled distance). Since speed proﬁles are time series, dy-namic time warping (DTW) [26] can be used as a metric fordetermining the similarity, and identifying the correspondingsimilar regions between two time series, which has beenapplied in many applications (e.g., speech recognition).The basic idea of DTW is to determine an optimal align-ment between two time series. Consider two time series X = ( x [ t ]) n X t =1 and Y = ( y [ t ]) n Y t =1 of lengths n X and n Y respectively. A warp path is deﬁned as W = ( w [ k ]) n W k =1 ,where the k -th element is w k = ( i, j ) , such that i is anindex from time series x [ i ] and j is an index from timeseries y [ j ] . n W is the length of the warp path W , such that max( n X , n Y ) ≤ n W < n X + n Y . The warp path W is subjectto the following constraints:1) w [1] = (1 , and w [ n W ] = ( n X , n Y ) ;2) if w [ k ] = ( i, j ) and w [ k +1] = ( i (cid:48) , j (cid:48) ) , then i ≤ i (cid:48) ≤ i +1 and j ≤ j (cid:48) ≤ j + 1 .The warp path of minimum distance dist ( W ∗ ) is deﬁned by: dist ( W ∗ ) = min w n W (cid:88) k =1 d ( w [ k ]) (4) where each d ( w [ k ]) = | x [ i ] − y [ j ] | is the distance of thecoordinates ( i, j ) of the k -th element in W . A simple approachto determine an optimal warp path between two time series isusing dynamic programming. But there are other more efﬁcientalgorithms with linear running time [26].Suppose each trip is divided into a sequence of segments ( R i ) . Let v D , V , R i [ t ] be the time series of speed proﬁle for tuple ( D , V , R i ) . For each pair of ( D , V , R i ) and ( D (cid:48) , V (cid:48) , R i ) , deﬁne χ R i ( D , V ) , ( D (cid:48) , V (cid:48) ) (cid:44) dist ( W ∗ ) (5)where W ∗ is the minimum-distance warp path between thetime series v D , V , R i [ t ] and v D (cid:48) , V (cid:48) , R i [ t ] .Let R ( D , V ) be a set of segments that have speed proﬁlesmeasured with ( D , V ) . Namely, if R i ∈ R ( D , V ) , then thespeed proﬁle v D , V , R i [ t ] exists in the dataset. Deﬁne a similaritymetric ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) between each pair of ( D , V ) and ( D (cid:48) , V (cid:48) ) by the average minimum warp path distance over all segments: ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) (cid:44) (cid:88) R i ∈ R ( D , V ) ∩ R ( D (cid:48) , V (cid:48) ) χ R i ( D , V ) , ( D (cid:48) , V (cid:48) ) | R ( D , V ) ∩ R ( D (cid:48) , V (cid:48) ) | (6)Note that ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) = ∞ , if R ( D , V ) ∩ R ( D (cid:48) , V (cid:48) ) = ∅ .For example, the speed proﬁles of three driver-vehicle pairs ( D , V ) , ( D , V ) , ( D , V ) for the same trip of a certain road R are plotted in Fig. 3. Smaller minimum warp path distanceis observed to have closer similarity in the speed proﬁle;namely, ( D , V ) is more similar to ( D , V ) than ( D , V ) . S peed ( k m / h ) (D ,V )(D ,V )(D ,V ) Fig. 3: Speed proﬁles of three drivers on the same trip. χ R ( D , V ) , ( D , V ) = 1 . and χ R ( D , V ) , ( D , V ) = 1 . .This paper uses ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) to characterize the similaritybetween each pair of ( D , V ) and ( D (cid:48) , V (cid:48) ) . The tuple ( D (cid:48) , V (cid:48) ) with the smallest value ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) is identiﬁed for estimat-ing energy consumption of ( D , V ) . For ﬁnding multiple similardata points, k -nearest neighbors ( k -NN) clustering is employedto ﬁnd the k most similar speed proﬁles with ( D , V ) .

2) Driving Habit Matching:

Speed proﬁles are not always available for the same environ-ment. An alternative is to rely on the available data collectedfrom other environments. An important factor for vehicleenergy consumption is the acceleration/deceleration [25]. Theaccelerating behavior of the drivers is related to vehicle energyefﬁciency. On the other hand, aggressive decelerations, usuallyinducing rear-end collisions, is related to driving awareness.For example, the mean acceleration and deceleration( µ d , µ a ) against the continuous average speed of each segmentfrom the data of a driver are plotted in Fig. 4. It is observedthat the decelerations/accelerations tend to be higher at a low v (km/h) µ d ( m / s ) v (km/h) µ a ( m / s ) High SpeedLow SpeedHigh SpeedLow Speed

Fig. 4: Mean acceleration and deceleration amplitudes vs.continuous average speed of each segment of a driver.speed (possibly, due to stop-and-go behavior), whereas lowerdecelerations/accelerations can be found at a high speed. Lowaccelerations are usually due to cruise control or mindfuldrivers, while high accelerations are due to aggressive driving.Therefore, we are motivated to use the average accelerationand deceleration as a metric to characterize driving habits.However, we may not have collected sufﬁcient measurementsfor every vehicle speed. Hence, we normalize the distributionof data to obtain a better estimation of the average. First,divide the range of vehicle speed into a sequence of inter-vals with width ∆ v (i.e., [ v, v + ∆ v ] ). This paper considers ∆ v = 10 km/h. For each interval [ v, v + ∆ v ] , let γ va ( D , V ) be the mean value of the acceleration measurements within [ v, v + ∆ v ] . Deﬁne the estimated average acceleration by theaverage of the mean values in all intervals by ¯ γ a ( D , V ) . Toavoid bias, we ignore the intervals in which the number of thedata points is less than 10.Because a difference is observed in high-speed and low-speed driving habits, we deﬁne different estimated averageaccelerations for the intervals above or below a threshold v th :1) Low-speed estimated average acceleration ¯ γ low a ( D , V ) .2) High-speed estimated average acceleration ¯ γ high a ( D , V ) .Similarly, deﬁne ¯ γ low d ( D , V ) and ¯ γ high d ( D , V ) for deceleration.Fig. 5 depicts an illustration of γ vd ( D , V ) , ¯ γ low d ( D , V ) and ¯ γ high d ( D , V ) from a dataset of driving data.

20 30 40 50 60 70 80 90 100 110 12000.20.40.60.81 v (km/h) µ d ( m / s ) γ vd ( D , V ) ¯ γ lowd ( D , V ) ¯ γ highd ( D , V ) Low Speed High Speed

Fig. 5: An illustration of ¯ γ d ( D , V ) and γ vd ( D , V ) .Heuristically, this paper sets v th = 80 km/h, because thisspeed limit usually sets the difference between highwaysand suburban roads. The average deceleration/accelerationtuple (cid:0) ¯ γ low a ( D , V ) , ¯ γ high a ( D , V ) , ¯ γ low d ( D , V ) , ¯ γ high d ( D , V ) (cid:1) cancapture the driving habit of each ( D , V ) . The average de-celeration/acceleration tuple is used to compare the similaritybetween each pair ( D , V ) and ( D (cid:48) , V (cid:48) ) . B. Matrix Factorization

The similarity matching approaches are based on domain-speciﬁc knowledge. Collaborative ﬁltering is a domain-freeapproach, relying on the identiﬁcation of abstract latent fac-tors. Matrix factorization is a popular approach of constructinglatent factors, which has been implemented in recommendationsystem [5] and other large-scale problems [4].Consider an example of sparse matrix R of n pairs of ( D , V ) and m road segments R , as shown in Table II, in which eachentry represents a measurement (e.g., v , (cid:126)d or (cid:126)a ). Note thatsome data points may be missing in R , denoted by “?”. D , V R m n ? 66 58 ? ... 88 TABLE II: An example of sparse matrix R of vehicle speed v .The basic idea of matrix factorization is to ﬁnd two low-rank ( n × k and m × k ) matrices, P and Q , such that P Q T can approximate R . Namely, R ≈ P Q T = ˆ R (7) P and Q can be regarded as mappings to reduce the m, n -dimensional space of the original dataset to a k -dimensionalspace of latent factors, where k (cid:28) min( m, n ) . Denote theentry at the i -th column and the j -th row of R be r ij .The objective of matrix factorization is ﬁnd P, Q such that min

P,Q (cid:88) i,j ( r ij − p i q Tj ) + λ P || p i || + λ Q || q j || (8)where p i is the i -th row vector of P , and q j is the j -th columnvector of Q . Since factorization may cause over-ﬁtting, λ P and λ Q are used to regularize the ﬁtting.There are two popular approaches to compute P, Q inEqn. (8): stochastic gradient descent [4] and alternating leastsquares [5]. In this paper utilizes stochastic gradient descent.The basic idea is to go through all r ij in R . For each r ij , determine the corresponding factor vectors p i and q j .Then, compute the approximate value by p i q Tj and update theparameters according to: p i ← p i + (cid:15) ( e ij q j − λ P p i ) q j ← q j + (cid:15) ( e ij p i − λ Q q j ) (9)where e ij = r ij − p i q Tj represents the difference betweenapproximate value and actual value and (cid:15) is the learning rate.Once P, Q are determined, the estimation of a missing data ˆ r ij can be estimated by ˆ r ij = p i q Tj . All measurements (e.g., v , (cid:126)d or (cid:126)a ) can be substituted and estimated using matrixfactorization. The estimated values can be utilized in thevehicle energy consumption prediction. C. Comparison with the Average

A simple approach for estimating vehicle energy consump-tion is based on the global average data values (e.g., average speed) from participatory sensing data. However, each drivermay deviate considerably from the average data values. Tocompensate for the deviations, a personalized adjustment isincorporated to improve the prediction accuracy.Let f D , V , R be a personal data value for tuple ( D , V ) in envi-ronment R , and the average data value be ¯ f R . An adjustmentfunction D f D , V ( · ) is used to convert the average data value tothe personal data value, such that: f D , V , R = D D , V f ( ¯ f R ) (10)In this paper, a simple adjustment function is considered bythe following regression model: D D , V f ( ¯ f R ) = η ¯ f R + η ¯ f R + η (11)V. E MPIRICAL E VALUATIONS

This section discusses the empirical evaluations of theenergy consumption model and its properties.

A. Setup

The driving data from 5 drivers and 7 vehicles is collected.The information of vehicles is given in Table III. Some driversdrove multiple vehicles, which gives totally tuples of ( D , V ).Since the context of participatory sensing is considered, itsufﬁces to consider a relatively small dataset. Vehicle Maker Model Year Type Displacement V Nissan LEAF 2014 EV NA V Ford Fiesta 2013 ICE 1.4 V Toyota Yaris 2013 ICE 1.5 V Hyundai Veloster 2014 ICE 1.6 V Ford Fusion 2012 ICE 2.5 V BMW 650i 2014 ICE 5.0 V Ford F150 2014 ICE 5.0

TABLE III: The vehicles in the experiments.Totally 3000 km of data is collected. Fig. 6 depicts thedistance of collected data for all driver-vehicle pairs. The datais then segmented into 1-km segments. D i s t an c e ( k m ) (2,6) (2,7) (3,5) (4,4) (4,2) (5,3) (1,1) (3,1) (4,1) (5,1) ICE vehicleEV

Fig. 6: Collected data of all driver-vehicle pairs.For ICE vehicles, we collected data through ELM327 de-vices connected to vehicles’ onboard diagnostic (OBD) portsand paired with a smartphone. The collected OBD data includemass air-ﬂow, manifold absolute pressure, intake air tem-perature and engine RPM. Geo-location data, accelerometerand gyroscope measurements from the smartphone are alsocollected. For EVs (i.e., Nissan LEAF), high resolution state-of-charge (SOC) and vehicle speed data are collected.

B. Estimation Errors of Energy Consumption Model

The ground truth energy consumption data (i.e., E D , V , R , E mv D , V , R , E id D , V , R ) can be obtained from OBD data. From theOBD data of ICE vehicles, the fuel rate can be estimated basedon mass air ﬂow and fuel/air ratio. From the OBD data of EVs,the energy consumption is estimated by SOC and the batterycapacity. Readers can refer to [27] for the details of extractionOBD data from EVs.Two metrics of error are utilized to evaluate the energyconsumption predictions in this study. The ﬁrst metric of erroris the per-segment error for each segment of road R i : ε i = ( E mv D , V , R i + E id D , V , R i ) − ( ˆ E mv D , V , R i + ˆ E id D , V , R i ) E mv D , V , R i + E id D , V , R i (12)which is used to evaluate the accuracy of the energy consump-tion model (Eqns. (2)-(3)). The second metric of error is theaccumulative error: ε acc = | E D , V , R − ˆ E D , V , R | E D , V , R (13)which is used to evaluate the energy prediction accuracy overa trip composed of many segments. Fig. 7 shows the speedproﬁle and energy consumption of an ICE vehicle and anEV data. The idling energy consumption ( E id D , V , R ) is identi-ﬁed from speed proﬁle, and the moving energy consumption( E mv D , V , R ) is obtained by Eqn. (1). li t e r Data trace of (D ,V ) k m / h Energy consumptionSpeed0 0.5 1 1.500.20.4 Distance (km) k W h Data trace of (D ,V ) 04080 k m / h E D,V,R E IdD,V,R E IdD,V,R

Regenerative braking

Fig. 7: Energy consumption of different driver-vehicle pairs.

C. Fitness of Energy Consumption Model

In this section, the proper powers of v, (cid:126)d, (cid:126)a in Eqn. (2)for model ﬁtting are evaluated. The Akaike Information Cri-terion (AIC) [28] is utilized to determine the proper valuesof ( r, k, m ) . AIC estimates the quality of each model andbalances the trade-off between the goodness of model ﬁttingand the complexity of model. The AIC value of a model can becomputed using the estimated residual in least square method.Consider the energy consumption model using ( r, k, m ) orderof powers in Eqn. (2), the AIC value is expressed by: AIC ( r,k,m ) = n log (cid:80) ε i n + 2 K (14) A v e r age A I C N Model (r,k,m)(1,2,2)(2,2,2)(3,2,2)(4,2,2)(1,1,1)(2,1,1)(3,1,1)(4,1,1) (a) Average normalized AIC values. −0.2 −0.1 0 0.1 0.205101520 Error ( ε ) N u m be r o f s eg m en t s Avg. = .1%S.D. = 7.8% (b) Distribution of per-segment errors. −0.2−0.100.10.2 (D,V) Pairs P e r − S eg m en t E rr o r ε (2,6) (2,7) (3,5) (4,4) (4,2) (5,3) (1,1) (3,1) (4,1) (5,1)ICEEV (c) Error distribution of model (2,2,2) Fig. 8: Experimental data and evaluation.where ε is the per-segment error (see Eqn. (12)), n is thenumber of segment and K is the total number of estimatedregression coefﬁcients (e.g., r + k + m + 3 ). According to AICtest criterion, the smaller value makes the better model. TheAIC value is normalized as AIC N ( r,k,m ) , with respect to thesmallest AIC value in all driver-vehicle pairs (min(AIC)) : AIC N ( r,k,m ) = AIC ( r,k,m ) min(AIC) − (15)The mean normalized AIC values averaged over all driver-vehicle pairs for a particular ( r, k, m ) are plotted in Fig. 8a,which shows that (2 , , attains the minimum, and hence, isperceived as the best setting of powers of v, (cid:126)d, (cid:126)a in Eqn. (2). D. Evaluation of Energy Consumption Model

The per-segment errors of (2 , , model for all ( D , V ) pairsare validated in this section. 80% of collected data (calledin-sample data) are randomly selected to train the regressionmodel. The rest of data (called out-sample data) are utilizedto validate the accuracy of the model. The per-segment errordistribution of out-sample data for driver-pair pair (4 , isshown in Fig. 8b. The total number of segment is 58 (58km).The mean error is 0.1%. The standard deviation is about7.8% and the distribution approaches a normal distribution.This shows that our energy consumption model is relativelyaccurate. The model validation results of all pairs are displayedin Fig. 8c. Slightly higher standard deviation of per-segmenterror is observed for EVs, because of a lower sample rate.Since we are interested in the energy consumption of theoverall trip, the accumulative error is more relevant. Fig. 9shows the accumulative error against traveled distance overmultiple rounds in the same route. It is observed that althoughthe standard deviation is up to 5%, the accumulative erroris much smaller. This is due to the fact that the positiveand negative deviations can offset each other over a longerdistance. Therefore, the accumulative error has a lower valueafter a longer distance. The root mean square accumulativeerror RMSE( ε acc ), which measures the performance over thetraveled distance, is also examined. In the later case study inSec. VI, RMSE( ε acc ) will be used to evaluate the accuracy ofenergy consumption prediction for a designated trip. E. Dependence of Coefﬁcients

To properly assign the dependence of coefﬁcients in theenergy consumption model in Eqns. (2)-(3), the distributionof coefﬁcients between all driver-vehicle pairs is examined ε a cc ε a cc ε acc (D ,V ) = 1.4% RMSE( ε acc ) = 3.9% ε acc (D ,V ) = 2.7% RMSE( ε acc ) = 5.3% Fig. 9: Accumulative error against traveled distance.to identify the dependence empirically. To compare betweendifferent energy resource, the suggested conversion from theUS Environmental Protection Agency (US EPA) is utilized toconvert the kwh to gasoline fuel, in which 33.7 kilowatt hoursof electricity is equivalent to one gallon of gasoline [29].To validate the dependence, a portion (80%) of training datais randomly drawn to train the model for each driver-vehiclepair, and the procedure is repeated 100 times to create 100 setsof coefﬁcients for each pair. As an example, the distributions ofcoefﬁcients α v, and c to the same driver or the same vehicleare plotted in Fig. 10. It is observed that the distributions for ofcoefﬁcients α v, and c of the same driver in different vehiclestends to be independent from another vehicle. In addition, therightmost ﬁgures show the distributions of different driversin the same vehicle are highly overlapping, which meansthe coefﬁcients are less affected by drivers. Therefore, thecoefﬁcients α v, and c are assigned to be vehicle dependent.The dependence of other parameters and coefﬁcients in Table Iare also validated. F. Driving Habits

This section compares the driving habits characterized bylow-speed and high-speed average acceleration/decelerationtuple. The average accelerations/decelerations of severaldrivers are plotted in Fig. 11, which are aggregated overmultiple trips. Positive correlations between average accel-eration and average deceleration is observed. Drivers whoaccelerate more tend to decelerate more. As a result, onecan classify the driving habits by awareness and efﬁciencyaccording to different regions in low-speed and high-speedaverage accelerations/decelerations plots, relative to the meanvalues among drivers. .1 .2 .3 V V D .1 .2 .3 V V V4 D .1 .2 .3 V V V .1 .2 .3 D D D D

50 100 D

50 100 V

50 100 α v, c Fig. 10: Distributions of coefﬁcients α v, and c . ¯ γ lowa ( D , V ) ¯ γ l o w d ( D , V ) ¯ γ higha ( D , V ) ¯ γ h i g h d ( D , V ) D ,V D ,V D ,V D ,V D ,V D ,V D ,V D ,V D ,V D ,V Z (1): Low efficiency & Low awareness Z (2): High efficiency & Low awareness Z (3): High Efficiency & High awareness Z (4): High awareness & Low efficiency Z (1) Z (1) Z (3) Z (4) Z (4) Z (2) Z (2) Z (3) Fig. 11: Regions of driving habits characterized by averageaccelerations/decelerations.VI. C

ASE S TUDY

This section presents the case study of various personalizedprediction approaches. All driver-vehicle pairs are required todrive in a designated route for evaluation. The ground truthenergy consumption data is also collected. Fig. 12 shows theenergy consumption and speed proﬁles for two driver-vehiclepairs. The designated route comprises of suburban (0 to 20km) and stop-and-go (20 to 31 km) parts. 3 rounds of drivingare repeated to obtain training data and reference data.

A. Personalized Vehicle Energy Consumption Prediction

This section compares the performance of various person-alized prediction approaches for vehicle energy consumptionusing the collected data for the designated route. The energyconsumption model of each driver-vehicle pair is trained usingthe historical data collected from daily driving. Then theenergy consumption of the route can be predicted using thecollected data from different driver-vehicle pairs for the route.For example, the energy consumption model for ˆ E mv of ( D , V ) of a particular road segment is obtained form the li t e r Speed profile of (D ,V ) k m / h Energy consumption Speed k W h Speed profile of (D ,V ) 04080120 k m / h Fig. 12: Energy consumption and speed proﬁles of the desig-nated route in the case study.historic data and is given as follows. ˆ E mv =  − . . . . .  T  vv g(cid:96)  +  − . . − . . − . .  T  τ d µ d σ d τ d µ d σ d  +  . . − . − . − . .  T  τ a µ a σ a τ a µ a σ a  Various personalized prediction approaches are consideredas follows:1)

Speed Proﬁle Matching (SPM): The paths among driver-vehicle pairs from historical data are matched usingGPS data. The distance metric ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) is computedusing Eqn. (6) for all pairs of ( D , V ) and ( D (cid:48) , V (cid:48) ) .Besides, k -nearest neighbors ( k -NN) clustering is uti-lized to determine k nearest pairs in distance metric ¯ χ ( D , V ) , ( D (cid:48) , V (cid:48) ) . The similarity matching approach basedon k nearest pairs is denoted by SPM( k )2) Driving Habit Matching (DHM): The low-speed andhigh-speed average deceleration/acceleration tuple (cid:0) ¯ γ low a ( D , V ) , ¯ γ high a ( D , V ) , ¯ γ low d ( D , V ) , ¯ γ high d ( D , V ) (cid:1) arecomputed for every driver-vehicle pair ( D , V ) . The low-speed and high-speed average deceleration/accelerationtuple deﬁnes a 4-dimensional data space. The similaritymatching approach based on k nearest pairs in the4-dimensional data space is denoted by DHM( k ).3) Matrix Factorization (MF): The paths among driver-vehicle pairs from collected data using GPS data arematched before employing matrix factorization. Thematrix factorization approach is denoted by MF.4)

Average Data Values (Avg.): Using only the average datavalues are used (e.g., average speed) for vehicle energyconsumption prediction. Sometimes, the average speedis observed to be very close to the speed limit. Theaverage data based approach is denoted by Avg.5)

Adjusted Personal Data Values (Adj.): The adjustmentfunction in Eqn. (11) is used to convert the average datavalues to the personal data values. The adjusted personaldata based approach is denoted by Adj. R M SE ( ε a cc ) (D ,V ) (D ,V ) (D ,V ) (D ,V ) (D ,V ) (D ,V ) (D ,V ) (D ,V ) Self Est.SPM(1)SPM(2)DHM(1)DHM(2)MFAdj.Avg.

Fig. 13: Prediction error RMSE( ε acc ) over all ( D , V ) pairs. ApproachesAccuracy Path SystemMatching ComplexitySPM High Required HighDHM High No LowMF High Required MediumAdj. High Required LowAvg. Low No Low

TABLE IV: Summary of strengths andweaknesses of various approaches.6)

Self-Estimation (Self Est.): Using only one’s own datain energy consumption model of the same road segmentin Eqns. (2)-(3) is also considered. The self-estimationapproach is denoted by Self Est. Self-estimation is abenchmark, which essentially validates the accuracy ofthe energy consumption model without using the data ofother driver-vehicle pairs. In practice, one’s own data ofthe same road segment may not be always present, asthe driver has not traveled such a route before.Fig. 13 compares the prediction errors in terms of RMSE,against the ground truth energy consumption over all driver-vehicle pairs. The strengths and weaknesses of each predictionapproach are summarized in Table IV.Self Est. is observed to have error, which is the lowestamong all approaches, because one’s own driving data on thesame route is the most accurate source for prediction, in spiteof the presence of different trafﬁc condition. SPM is observedto have a close prediction error with Self Est. Avg. is observedto have the largest error, because of the considerable deviationfrom individual drivers from the average. Adj. can improvethe accuracy of comparison with the average. Notably, DHMis observed to perform relatively well, even though it doesnot require path matching using GPS data. Therefore, drivinghabits are a good indicator of vehicle energy consumption. Insummary, DHM provides good accuracy without GPS data,which has low complexity for system implementation. B. Distance-to-Empty Prediction for EV

In this section, our approaches are applied to the applicationof DTE prediction for EV (i.e., Nissan LEAF ) using other ICEvehicle data. For the convenience of comparison, certain routesare selected and all drivers to required to travel the same routeat least times for evaluations.The data collected from Nissan LEAF includes:1) State-of-charge (SOC), denoted by S , which indicatesthe remaining battery level.2) Initial capacity of the battery, denoted by B A .3) Battery pack voltage when driving, denoted by B V .The remaining energy ( ∆ E t ) in battery at time t is given by: ∆ E t = S t × B A × B V (16)If the future average power intensity ( ¯ P ) is known, thenestimated DTE is given by: (cid:91) DTE = ∆ E t ¯ P (17) The DTE prediction based on approaches using participa-tory sensing data is compared with the on-board DTE meteron Nissan LEAF (also known as Guess-O-Meter), which iscaptured by a camera mounted over the dashboard. To comparethe effectiveness of DTE prediction, the deviation between thetrue DTE (which is computed in an ofﬂine manner) and theestimated (cid:91) DTE is measured by: ∆DTE = DTE − (cid:91) DTE (18)The deviation between true DTE and the on-board DTE meteron Nissan LEAF is compared. ∆ D T E ( k m ) ∆ D T E ( k m ) On−board Self Est. SPM DHM Adj. Avg.

Fig. 14: Deviations of DTE prediction for various approaches.The results are plotted in Fig. 14 for four different trips.It is observed that all approaches using participatory sensingdata can signiﬁcantly outperform the one provided by on-board DTE provided by the on-board DTE meter on NissanLEAF. Notably, SPM, DHM and Adj. perform very close tobenchmark Self Est., whereas Avg. gives relatively inferiorperformance. In summary, DHM consistently provides goodaccuracy with low system complexity.VII. C

ONCLUSION

In this paper, various methodologies of utilizing participa-tory sensing data for personalized prediction of vehicle energyconsumption were investigated. Several approaches were stud-ied and compared, including: (1) comparison with the averageusing personalized adjustment, (2) two similarity matching ap-proaches based on driver/vehicle/environment-dependent fac-tors using speed proﬁle matching and driving habit matching,and (3) a collaborative ﬁltering approach that uses matrix fac-torization. Our empirical evaluations show that participatorysensing data can signiﬁcantly improve prediction accuracy. Among all approaches, similarity matching approach basedon driving habits provides good accuracy (as compared to abenchmark of self-estimation using one’s own driving data)with low system complexity. To evaluate the effectiveness, acase study of DTE prediction for EVs is conducted based onthe participatory sensing data. In summary, similarity matchingbased on driving habit can provide a practical solution of DTEprediction for EVs, which signiﬁcantly outperforms the on-board DTE meter on Nissan LEAF.Despite the promising results given by our study, there areseveral practical issues and limitations to be recognized: • Road Grade : This is the elevation of roads. There arepublic mapping APIs to provide road elevation data. Thiscan be added to the future energy consumption model. • Weather and Trafﬁc : Our study assumes mild weatherand trafﬁc conditions. But our energy consumption modelcan be extended to incorporate additional parameters tocapture the impacts of weather in the vehicle model(e.g., weather types and route conditions). The vehiclespeed from participatory sensing data naturally reﬂect thetrafﬁc condition to a certain extent, however, the datawould need to be updated more frequently. Wind speedand road surface conditions also affect vehicle energyconsumption, but are more difﬁcult to measure. • More Vehicle Information : The weight of vehicle and tirepressures of vehicle would also introduce the error to thesystem, the error can be minimized obtaining more datafrom vehicle API (e.g., tire pressure indicator). • Change of Vehicle State : The engine/gearbox efﬁcienciesor battery efﬁciencies change over time, due to aging ofvehicle or system upgrades. Hence, the energy consump-tion model should adapt to these changes using new datato update the coefﬁcients.A study is planned to be conducted in future work to providemore comprehensive insights in diverse practical settings,addressing the preceding issues.R

EFERENCES[1] A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, R. A. Peterson,H. Lu, X. Zheng, M. Musolesi, K. Fodor, and G.-S. Ahn, “The rise ofpeople-centric sensing,”

IEEE Internet Comput. , vol. 12, no. 4, pp. 12–21, 2008.[2] R. K. Ganti, N. Pham, H. Ahmadi, S. Nangia, and T. F. Abdelzaher,“GreenGPS: a participatory sensing fuel-efﬁcient maps application,” in

ACM Mobile Systems, Applications, and Services (Mobisys) , 2010.[3] S. Grubwinkler and M. Lienkamp, “A modular and dynamic approachto predict the energy consumption of electric vehicles,” in

Conf. FutureAutomotive Technology , 2013.[4] R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis, “Large-scale matrixfactorization with distributed stochastic gradient descent,” in

ACM Conf.Knowledge Discovery and Data Mining (SIGKDD) , 2011.[5] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques forrecommender systems,”

J. of Computer , vol. 42, pp. 30–37, 2009.[6] K. Kraschl-Hirschmann and M. Fellendorf, “Estimating energy con-sumption for routing algorithms,” in

IEEE Intelligent Vehicles Symp. ,2012.[7] E. Kim, J. Lee, and K. G. Shin, “Real-time prediction of battery powerrequirements for electric vehicles,” in

IEEE/ACM Int. Conf. on Cyber-Physical Systems , 2013.[8] A. Cappiello, I. Chabini, E. K. Nam, A. Lue, and M. A. Zed, “Astatistical model of vehicle emissions and fuel consumption,” in

IEEEIntelligent Transportation Systems Conf. , 2002. [9] J. A. Oliva, C. Weihrauch, and T. Bertram, “A model-based approachfor predicting the remaining driving range in electric vehicles,” in

IEEEPrognostics and Health Management , 2013.[10] Q. Yang, K. Boriboonsomsin, and M. Barth, “Arterial roadway en-ergy/emissions estimation using modal-based trajectory reconstruction,”in

IEEE Int. Transportation Systems Conf. , 2011.[11] E. Wilhelm, J. Siegel, S. Mayer, L. Sadamori, S. Dsouza, C.-K. Chau,and S. Sarma, “Cloudthink: A scalable secure platform for mirroringtransportation systems in the cloud,”

Transport , vol. 30, no. 3, 2015.[12] S. Dornbush and A. Joshi, “Streetsmart trafﬁc: Discovering and dis-seminating automobile congestion using VANET,” in

IEEE VehicularTechnology Conf. , 2007.[13] C.-M. Tseng, C.-K. Chau, S. Dsouza, and E. Wilhelm, “A participatorysensing approach for personalized distance-to-empty prediction andgreen telematics,” in

ACM Int. Conf. Future Energy Systems (e-Energy) ,2015.[14] C.-M. Tseng, S. Dsouza, and C.-K. Chau, “A social approach forpredicting distance-to-empty in vehicles,” in

ACM Int. Conf. FutureEnergy Systems (e-Energy) , 2014.[15] M. J. Burke, N. Sarafopoulos, and V. Q. To, “Electronic system andmethod for calculating distance to empty for motorized vehicles,” 1994,US Patent 5,301,113.[16] L. Rodgers, E. Wilhelm, and D. Frey, “Conventional and novel methodsfor estimating an electric vehicle’s distance to empty,” in

ASME Intl.Conf. Advanced Vehicle Technologies , 2013.[17] A. Bolovinou, I. Bakas, A. Amditis, F. Mastrandrea, and W. Vinciotti,“Online prediction of an electric vehicle remaining range based onregression analysis,” in

IEEE Int. Electric Vehicle Conf. , 2014.[18] H. Yu, F. Tseng., and R. McGee, “Driving pattern identiﬁcation for evrange estimation,” in

IEEE Int.l Electric Vehicle Conf. , 2012.[19] K. Boriboonsomsin, M. J. Barth, W. Zhu, and A. Vu, “Eco-routingnavigation system based on multisource historical and real-time trafﬁcinformation,”

IEEE Trans. Intell. Transp. Syst. , vol. 13, no. 4, pp. 1694–1704, 2012.[20] M. Sachenbacher, M. Leucker, A. Artmeier, and J. Haselmayr, “Efﬁcientenergy-optimal routing for electric vehicles,” in

AAAI Conf. ArtiﬁcialIntelligence , 2011.[21] S. Grubwinkler, M. Kugler, and M. Lienkamp, “A system for cloud-based deviation prediction of propulsion energy consumption for evs,”in

IEEE Int. Conf. Vehicular Electronics and Safety , 2013.[22] C.-M. Tseng and C.-K. Chau, “On the privacy of crowd-sourced datacollection for distance-to-empty prediction and eco-routing,” in

ACMWorkshop on Electric Vehicle Systems, Data and Applications (EV-Sys) ,2016.[23] C.-K. Chau, K. M. Elbassioni, and C.-M. Tseng, “Fuel minimization ofplug-in hybrid electric vehicles by optimizing drive mode selection,” in

ACM Int. Conf. Future Energy Systems (e-Energy) , 2016.[24] C.-K. Chau, K. Elbassioni, and C.-M. Tseng, “Drive mode optimizationand path planning for plug-in hybrid electric vehicles,” to appear inIEEE Trans. Intell. Transp. Syst. , 2017.[25] E. Ericsson, “Indepentent driving pattern factors and their inﬂuence onfuel-use and exhaust emission factor,”

J. of Transportation Research ,vol. 6, no. 5, pp. 325–345, 2001.[26] S. Salvador and P. Chan, “Toward accurate dynamic time warping inlinear time and space,”

J. of Intelligent Data Analysis , vol. 11, no. 5,pp. 561–580, 2007.[27] C.-M. Tseng, W. Zhou, M. A. Hashmi, C.-K. Chau, S. G. Song,and E. Wilhelm, “Data extraction from electric vehicles through OBDand application of carbon footprint evaluation,” in

ACM Workshop onElectric Vehicle Systems, Data and Applications (EV-Sys) , 2016.[28] H. Bozdogan, “Model selection and akaike’s information criterion (aic):The general theory and its analytical extensions,”

J. of Psychometrika ,vol. 52, pp. 345–370, 1987.[29] U. E. P. Agency,