Investigating Underlying Drivers of Variability in Residential Energy Usage Patterns with Daily Load Shape Clustering of Smart Meter Data
Ling Jin, C. Anna Spurlock, Sam Borgeson, Alina Lazar, Daniel Fredman, Annika Todd, Alexander Sim, Kesheng Wu
11 Investigating Underlying Drivers of Variability inResidential Energy Usage Patterns with Daily LoadShape Clustering of Smart Meter Data
Ling Jin, C. Anna Spurlock, Sam Borgeson, Alina Lazar, Daniel Fredman, Annika Todd, Alexander Sim,Kesheng Wu
Abstract —Residential customers have traditionally not beentreated as individual entities due to the high volatility in residen-tial consumption patterns as well as a historic focus on aggregatedloads from the utility and system feeder perspective. Large-scaledeployment of smart meters has motivated increasing studiesto explore disaggregated daily load patterns, which can revealimportant heterogeneity across different time scales, weatherconditions, as well as within and across individual households.Such heterogeneity provides insights into household energy be-havior and reveals sources and drivers of variability that iscritical for utilities to understand in order to design efficient andeffective demand side management strategies. This paper aims toshed light on the mechanisms by which electricity consumptionpatterns exhibit variability and the different constraints thatmay affect demand-response (DR) flexibility. We systematicallyevaluate the relationship between daily time-of-use patterns andtheir variability to external and internal influencing factors,including time scales of interest, meteorological conditions, andhousehold characteristics by application of an improved versionof the adaptive K-means clustering method to profile ”household-days” of a summer peaking utility. We find that for this summer-peaking utility, outdoor temperature is the most importantexternal driver of the load shape variability relative to seasonalityand day-of-week. The top three consumption patterns representapproximately 50% of usage on the highest temperature days.Having an electric dryer and children-in-home are the leadingpredictors of a more variable consumption schedule, while con-versely homes with elderly residents exhibit the most stable day-to-day routines. Among the customer vulnerability characteristicsconsidered here (chronic-illness, elderly, and low-income), we findlow-income households tend to have more variable consumptionpatterns. The variability in summer load shapes across customerscan be explained by the responsiveness of the households tooutside temperature. Our results suggest that depending on theinfluencing factors, not all the consumption variability can bereadily translated to consumption flexibility. Such informationneeds to be further explored in segmenting customers for betterprogram targeting and tailoring to meet the needs of the rapidlyevolving electricity grid.
Index Terms —Residential load shapes; smartmeter;discretionary consumption; Flexibility and time-basedmanagement; whole time series clustering; adaptive k-means.
L. Jin, C. A. Spurlock and A. Todd are with the Energy Analysisand Environmental Impacts Division, Lawrence Berkeley National Labora-tory, Berkeley, CA 94720 USA (e-mail: [email protected], [email protected],[email protected]).S. Borgeson is with Convergence Data Analytics, LLC, Oakland, CAA. Lazar is with Youngstown State University, Youngstown, OHD. Fredman is with University of Vermont, Burlington, VTA. Sim and KS. Wu are with the Computational Research Division,Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
I. I
NTRODUCTION
With the rise of residential Advanced Metering Infrastruc-ture (AMI) in the past decade, increasing prevalence of high-resolution meter data has motivated more research to applyload profiling to residential customers (e.g. [1], [2], [3], [4],[5], [6], [7], [8], [9]) that have traditionally not been treatedas individual entities due to the high volatility in residentialconsumption patterns as well as a historic focus on aggregatedloads from the utility and system feeder perspective [10]. Dataand evidence based insights into behavioral usage patternshold the potential for efficient and effective load forecastingand planning, demand response management, time-of-use tariffdesign, and electricity settlement [11], [12], [13], [14], [9].Despite this progress, most work have focused on loadshape differences between customers and therefore appliedclustering to individual households each associated with pre-averaged load shapes or attributes (e.g. averaged profiles byseason, by day of week, or by month). Such an approachignores the day-to-day variability within households, therebyoverlooking fine-grained information on household attributesand sources of variation that could be valuable for subsequentclassification models and segmentation. This omission canhave meaningful repercussions in terms of our understandingof patterns of electricity load. A recent study by Yilmaz etal. [15] has demonstrated significant differences in clusteringresults between daily load profiles averaged over individualhouseholds and raw daily load profiles (with no averaging);Kwac in [6] found that although two homes might have thesame average profiles, the diversity of load patterns from oneday to the next could vary significantly; McLoughlin et al.in [4] applied cluster analysis to day-to-day usage patternsof individual households and found the sequence of resultingdaily patterns useful for further segmenting customers; andHaben et al. in [7] demonstrated day-to-day variability inenergy consumption within the household together with theiraverage behavior was sufficient to meaningfully distinguishhouseholds.The diversity or variability of day-to-day time-of-use pat-terns within households is especially important in the contextof demand response and energy efficiency programs, as suchinformation may directly relate to each household’s suitabilityto various demand side management strategies. For example,it is speculated that households with variable consumptionschedules may be more flexible and therefore likely to respond a r X i v : . [ s t a t . A P ] F e b to time of use pricing incentives [15], whereas those withregular demand during the daytime are ideal to target forintegrating solar energy [16]. However, these speculations havenot been sufficiently empirically verified and specifically theunderlying drivers of variability in day-to-day usage patternsare yet to be examined. Consequently there is no consensusin the literature on a comprehensive relationship betweenvariability and flexibility ([17], [6], [15] and detailed in SectionII).In this study, we apply an efficient whole 24 hour time seriesclustering algorithm to each daily load profile (household-day)of a large sample of residential customers as opposed to aggre-gated load profiles for individual households. This approachallows us to assign each household to multiple representativeload patterns that may vary from day to day, so that patterns inelectricity consumption and their variability within and acrosshouseholds can be derived. We systematically examine thisload shape variability based on distributional differences in theresulting dictionary of load shapes across underlying external(such as seasonality, day of week, outside temperature) andinternal (such as socio-demographic and household properties)drivers. In contrast to previous studies which typically useload data sets ranging from a few hundred to a few thousandhouseholds (see review [9]), our load data consists of morethan 30 million daily load profiles from approximately 100,000households.The context of our analysis is a summer-peaking utility thathas a time-of-use (TOU) electricity pricing option. This typeof time-based pricing is an example of a mechanism, likedemand response (DR) programs, designed to shift electricityconsumption from the highest demand times of day througha monetary incentive. This shift relies on customers activelychanging their behavior in response to these incentives. TOUpricing programs are gaining traction as attractive alternativesto more traditional flat or inclining-block electricity rates at theresidential level. California, for example, has authorized theirinvestor-owned utilities (IOUs) to institute default TOU pric-ing across their residential customer base [18]. Understandingunderlying patterns of behavior at an individual householdlevel and how these patterns relate to the timing relevant fora TOU program is therefore highly valuable.Within this TOU pricing context, we focus our analy-sis specifically on customer-controlled electricity loads (e.g.,lighting, air conditioning, computer equipment, entertainment,dishwashers, laundry equipment), rather than installed equip-ment demand (e.g., electric tank-type water heater or refriger-ator load). Households are more able to quickly adjust thesecustomer-controlled loads with active behaviors in a short-run response to a DR program like TOU pricing, relative tothe more long-run response of replacing installed equipmentwith more energy efficient versions. To isolate this customer-controlled usage we introduce an additional innovation uniqueamong the research in this area: we focus on clustering“discretionary” electricity usage profiles (further defined inSection III B) rather than profiles of total hourly electricityuse.The goal of this research is to demonstrate how dailyconsumption patterns and their diversity in discretionary elec- tricity consumption within and across households can beexplained by factors relevant to DR programs, such as day-of-week, season, meteorological conditions, and householdcharacteristics. By doing so, we aim to shed light on themechanisms by which electricity consumption patterns ex-hibit variability and the different constraints that may affectDR flexibility. The paper is organised as follows: SectionII provides a review of clustering methods used in the lit-erature and current understanding of load shape variability,thereby explaining the methods and approaches and definingthe challenges. Section III presents the data and methodsused in this study. The resulting dictionary load shapes andtheir distributional differences in relation to both external andinternal factors are examined and discussed in Section IV.Section V concludes.II. R ELATED WORK AND OUR CONTRIBUTION
A. Direct load shape clustering
Cluster analysis is a commonly used unsupervised learningtechnique used for load profiling that can help discover andunderstand patterns in electricity consumption. This studyapplies whole 24-hour time series clustering to daily loadprofiles, which falls under the direct-clustering based approachaccording to [9]. As reviewed in Chicco [10], a numberof direct clustering techniques, such as k-means, follow theleader, and self-organizing maps, were applied to whole-building load data to construct load profiles for non-residential(i.e., industrial and commercial) customers. Residential cus-tomers are characterized by highly volatile behavior, whichchallenges the application of clustering methods to individualload curves [10]. Using a large sample of residential dailyload profiles ( > the daily maximum; Piao et al.[23], Han et al. [24], and Caoet al. [3] employed min-max normalization, which subtractsthe minimum from the data and divides by the maximum; andKwac et al. [6] normalized hourly demand by the daily total.We also apply a normalize-to-one procedure, however, priorto this step we isolate discretionary electricity consumptionfrom total hourly consumption through a “de-minning” processdescribed in Section III. B. Load shape variability and demand-side flexibility
Demand-side flexibility is defined as the ability for con-sumers to change how, when, and where energy is used[25]. As the energy system becomes increasingly suppliedby variable renewable generation, predictable and/or control-lable flexibility becomes increasingly important for balancingsupply and demand [26]. The variability of the daily load”shapes” or the day-to-day changes in consumption schedulesof households provides a reasonable indication of whether andhow the households may change their energy usage in responseto a utility program [15]. Note that this paper is focusedon variability or diversity in the consumption schedules, i.e.the ”shape” of daily loads, rather than variability in absoluteelectricity consumption at given times of the day (e.g. [5], [7])or the intra-day variations in consumption levels (e.g. [27]),which are also important features to explore.Past studies have employed load shape clustering to explic-itly quantify the ”shape” variability [28], [29], [6] as a measureof potential demand response flexibility. However, there hasnot been consensus in the literature on a comprehensive rela-tionship between variability and demand response flexibility.Using a data set collected from a residential demand responseprogram, [29] found that customers with more variable con-sumption patterns are more likely to reduce their consumptioncompared to those with a more regular consumption behavior.In contrast, [6] proposed that a more stable household thatshows the same load shape every day should be targetedby DR programs rather than one that is highly variable. Asreviewed by [26], there are significant literature gaps in boththe mechanisms by which electricity consumption patternsexhibit variability and the different constraints or motivatingfactors that can reshape them.Emerging studies have taken a deeper dive into the underly-ing drivers of load shape variability including season, day ofweek, temperature, and household characteristics. [30] useda Hidden Markov Model to learn the consumption dynamicbehavior under the corresponding environments and concludedthat customers can be grouped in three categories: normal,sensitive, and insensitive households in relation to outdoortemperature changes. [31] applied agglomerative hierarchicalclustering based on proportions of different load curve cat-egories in different seasons and found the behavioral pat-terns of customer groups are highly consistent across severalseasons. [32] used a constrained Gaussian mixture modelwhose parameters vary according to the day type (weekday,Saturday or Sunday), and crossed the clustering results withcontextual variables available for the households to show theclose links between electricity consumption and householdsocio-economic characteristics. [33], [4] correlated load profile classes with socio-economicdeterminants yet did not explore the variability linkage. [34]indicated that understanding DR potential of vulnerable andlow-income customers was especially lacking. [35] found noevidence that vulnerable populations (low income, elderly, orchronically ill) were unduly harmed or burdened by a time-of-use pricing program, but did nothing to characterize the usageof such customers, which could provide an better understand-ing of the mechanisms through which such customers respond,or not, to that or similar programs. Using the dictionary ofhousehold-day load shapes derived from our study, we areable to systematically examine the correlation between loadshape variability across various temporal scales and customercharacteristics (including the above mentioned vulnerabilitycharacteristics) to better understand the source of variation andtheir implications for DR flexibility.Lastly, entropy has been a common metric to quantifyload shape variability and its application has been limited tocharacterizing individual customers [28], [29], [6]. Essentially,entropy quantifies the distribution or diversity of a given setof load shapes. In this paper, in addition to computing loadshape entropy of individual customers, we also use this metricmore flexibly to characterize load shape variability underdifferent time scales, days, and outside temperature ranges tounderstand the role of these external drivers.III. M
ATERIALS AND METHODS
A. Dataset
We cluster household-day load profiles based on hourlyconsumption data collected from a summer-peaking utilityin California. The data consist of over 30 million dailyload profiles (“household-days”) from approximately 100,000households, measured between June 1st, 2011 and May 31st,2012, prior to the implementation of time-of-use (TOU) pric-ing. Therefore, the data set represents consumption behaviorabsent any time-based rate or other related program.In addition, household information was collected in a surveyof 6413 participants in the utility service area. The house-hold characteristic variables used in this study are processedinto binary indicators on: (1) socio-demographic and lifestyleinformation (low-income, chronically-ill, elderly, children-in-home, college-degree, work-full-time, work-from-home); (2)dwelling information (single family home); (3) applianceownership information (electric dryer, central air conditioner,room air conditioner, programmable thermostat). Note that weexplicitly included three vulnerability indicators (low-income,chronically-ill, and elderly) in order to inform the literaturegap identified in Section II B.The clustering method employed here improves upon themethod developed in [6] and is illustrated in Figure 1 anddescribed in detail below.
B. Deriving discretionary load shapes
We first conduct data cleaning and establish the format ofthe object to be clustered. We apply the same preprocessingcriteria as Kwac et al. [6]: dropping daily usage data withmissing hour observations, or with low average demand (below
Fig. 1. Modified adaptive k-means procedure to derive representative dictionary load shapes and their membership based on discretionary usage patterns . C. Load shape clustering with adaptive k-means
The application of adpative k-means to our dataset is morethoroughly documented in an earlier report [36] and can befound in the supporting materials and briefly described here. Specifically, clustering the load profiles after normalization without “de-minning” resulted in more than 65% of the daily load profiles in the databeing assigned to a single flat-shaped cluster. When the daily load profileswere “de-minned” prior to normalization and clustered into the same numberof clusters, the highest concentration of load profiles assigned to a singlecluster was approximately 10%.
Fig. 2. Illustration of the flattening effect of normalization without de-minning
After preprocessing, the subsample of 100,000 “de-minned”and normalized load shapes is moved into the load shapeclustering step. They are first passed through an adaptive k-means algorithm ([6]), which splits the data set of load shapesinto K clusters, such that the relative squared error (RSE) ofany load shape assigned to a cluster is not greater than an errorthreshold θ . The RSE is defined in the equation below, where s is the load shape of interest, t is the hour of day index,and C i is the cluster center to which s is assigned. The errorthreshold θ is varied from 0.05 to 0.5 to determine a suitablevalue that results in the most reasonable K . RSE s,i = (cid:80) t =1 ( s ( t ) − C i ( t )) (cid:80) t =1 ( C i ( t )) (1)As the resulting clusters from adaptive k-means are typicallyhighly correlated, in the second step we follow Kwac et al.[6], and undertake a subsequent hierarchical merging of theclusters by sequentially combining the most similar clustersuntil their total count reaches a target number K . Underthis transformation the requirement that all RSEs fall under θ is relaxed. In particular, the target size K is selectedsuch that it is the smallest number of clusters for which lessthan 5% of the load shapes violate the θ threshold condition.Guided by the acceptable error threshold, we allow our originalnumber of clusters to grow into the thousands before applyingquantitative criteria in the post clustering processing stepdescribed in the next section. D. Iterative dictionary truncation
In the post clustering processing phase, we implemented aniterative truncation algorithm (illustrated in Algorithm 1) toremove cluster centers with low member counts with a user-defined clustering quality metric: the overall violation rate( V : the fraction of load shapes with RSE > θ ). Dictionary akmeans: Adaptive Kmeans algorithm based on threshold. R packageversion 1.1. https://CRAN.R-project.org/package=akmeans truncation allows us to focus on the electricity consumptionpatterns that represent the majority of the household-dayobservations, rather than outliers. In contrast to the originaladaptive k-means algorithm, the parameter V we introducedhere ensures the maximum reduction of final dictionary sizewhile controlling for clustering quality. The advantage of thispost-processing procedure is to avoid tuning of the numberof clusters K (a usual hyper-parameter in k-means-type ofclustering) because an optimal K is automatically determinedafter the iterative truncation. Without this iterative process(i.e. the original adpative k-means implemented by [6]), theclustering quality as indicated by the violation rate generallyincreases by 30 Algorithm 1
Iterative dictionary truncation.V: violation rate θ : error threshold LOOP Process while violation < V do Identify the ids of smallest clusters whose shape mem-bers comprise the fraction V of the total number ofshapes Remove those clusters Reassign the shapes that were members of the removedclusters into the remaining closest clusters Compute violation rate as fraction of load shapes withRSE > θ end while Following the truncation procedure the resulting set ofremaining cluster centers are defined as the “dictionary” ofdiscretionary usage patterns (referred to as “dictionary loadshapes”). Finally, each household-day in the full data set isassigned to the single closest dictionary load shape based onEuclidean distance.
E. Entropy to quantify load shape variability
We use load shape entropy (defined below) to quantify thedistribution or diversity of a given set of the load shapes.Greater entropy indicates the load shapes are distributed moreuniformly and therefore more diverse and variable, whilesmaller entropy indicates the distribution is concentrated onfewer dictionary load shapes and is therefore less variable. S i = − (cid:88) c ∈ C i p i,c • log ( p i,c ) (2)Where S i is the Entropy of set i ; C i is the set of dictionaryload shapes observed in set i ; c is any dictionary load shapethat occurs within set i , and p i,c is the frequency of dictionaryload shape c that occurs within set i .The set i is usually defined by customer, that is, C i is theset of dictionary load shapes observed for customer i overa certain time period. In addition, to understand the overallrelationship between variability and external factors, we alsocompute population-level entropy of a given temporal period.In such a case, the set i is defined by season, day-of-week, dayswith a certain temperature levels, or by each day. For example, C i could be the set of dictionary load shapes observed for thesummer season across all customers.IV. R ESULTS AND D ISCUSSIONS
A. Clustering results and descriptive analysis
The clustering results were documented in a previous report[36] and briefly described here. The resulting number ofclusters from the adaptive k-means procedure depends on thechosen error threshold ( θ ). The relationship derived by runningadaptive k-means for θ varying from 0.05 to 0.5 is plottedin Figure 3. We selected θ = 0.3 based on our criteria thatthe number of clusters compared to the original number ofhousehold-days should not be excessively large (approximately5000 initial clusters in our case), and the marginal gain in errorimprovement to the explanatory power by increasing θ shouldbe small. Fig. 3. Error threshold ( θ ) and number of clusters By limiting the total share of load shapes that violate the θ threshold to 5% as suggested by Kwac et al. [6], thehierarchical clustering step consolidated the number of clustersfrom approximately 5000 ( K ) to 2000 ( K ) in our case.The top 600 of these 2000 clusters (sorted by member count)account for approximately 90% of the data. It is becauseof this long tail of low-membership clusters that we appliedthe truncation procedure described in Algorithm 1. We usethe iterative truncation algorithm with violation rates ( V )of 10% and 30%, which results in sets of cluster centersnumbering 608 and 99, respectively. We evaluate these twosets of representative clusters by using the Davies-Bouldinindex (DBI), which is a metric of cluster separation ([37],[20]). The DBI for the 608 and 99 cluster center sets are2.23 and 2.22, respectively, indicating both sets of clustershave similar performance with respect to cluster separation.Because of the increased ease in further analyzing a smallerset of clusters, we chose the 99 cluster centers as the finaldictionary of representative load shape clusters to which thefull data set of discretionary clusters are assigned based onsmallest Euclidean distance.The distribution of kilowatt-hour (kWh) usage representedacross all the 99 dictionary load shapes is not uniform; as isdemonstrated in Figure 4, the top cluster with respect to kWh of usage captured by that cluster accounts for approximately13% of the total kWh usage; the top 38, 53, and 73 dictionaryload shapes respectively cover 70%, 80%, and 90% of theelectricity use underlying the approximately 30 million loadshapes across households over the entire year period. Fig. 4. Distribution of dictionary load shapes (left) and their cumulativeelectricity coverage (right).
The top 16 dictionary load shapes of this final dictionaryranked by their total daily electricity load coverage are shownin Figure 5 and account for more than 40% of all the householddaily load shapes in the data set, and more than 50% of totaldaily electricity load. The utility defines their peak period forthe TOU rate from 4 to 7 PM (i.e., hours 16 through 18 asindicated by the gray shaded areas in Figure 5) on non-holidayweekdays. While in aggregate, most of the high electricityusage happens during this period, Figure 5 demonstrates thatthe clusters exhibit considerable variability in peak timing ofdiscretionary usage as well as number of peaks. For example,the clusters with ranks 3 through 9 and 13 through 16 in Figure5 reflect patterns of significant discretionary usage peaks thatare outside of the TOU peak period. Conventional expectationof the most common load shape may be a morning peak andan evening peak, before and after work, but only clusters5 and 16, representing only 3.6% and 1.2% of total dailyconsumption, respectively, really follow this classic pattern.Other double peaking load shapes follow different peak timingfrom the conventional expectation: for example, the secondpeak in cluster 3 occurs in late evening, while clusters 9 and14 have distinct double peaks at noon and late evening.Figure 6 aligns the 99 dictionary load shapes on a twodimensional space governed by peak timing (x-axis) andnumber of peaks (y-axis). Interpreting patterns within thisfigure broadly, we see that of all the household-day loadshapes in the full year data set, approximately 75% areassigned to dictionary load shapes that are single peaking, andapproximately 22% are double peaking. Daytime (10 AM to4 PM), TOU peak (4 to 7 PM), and evening (7 to 11 PM) arethe most frequent times when major peaks occur, accountingfor 23%, 26% and 27% of the full year data set, respectively.Of particular interest in this set of results is the degree towhich the system peak (defined as the TOU peak from 4-7pm) does not necessarily represent the most frequent peakin discretionary usage as represented by this dictionary ofload shapes. Only 26% of all household-day load shapesare assigned to dictionary load shapes that exhibit significantdiscretionary usage peaking during that time period. For this
Fig. 5. Sixteen highest electricity using dictionary load shape cluster centers.Notes: The title of each panel is: [Cluster reason it is important to understand the underlying behaviorgenerating these patterns to be able to best target and affectthe behaviors that may have the largest impact on the systempeak.
B. Temporal and meteorological heterogeneity
This utility’s TOU rate is to be in effect for customersduring the summer months only, and the higher peak price isonly charged during the peak hours (4-7pm) on non-holidayweekdays. Because the season and days of the week areprimary dimensions over which these rates are defined, weexplore distributional differences in electricity usage patterns(Figure 7) and their variability (Figure 8) across these twotimescales. In addition, we investigate the relationship of loadshape patterns and their diversity with temperature in thesummer season because increased consumption during peakhours on hot days represents the highest peak usage times forthis utility.
1) Day-of-week variability: most TOU rates, including theone offered by this utility, include peak period rates thatare applicable only on weekdays. The representative usagepatterns derived within and across households allow us toassess the degree to which weekdays differ from weekendswith respect to the behavior patterns of these residentialcustomers. According to Figure 7a, the first observation ofnote is that five of these six dictionary load shapes are commonbetween weekdays and weekends, while the 6th cluster differs.The 6th-ranked cluster on weekdays peaks in the evening,whereas the one on weekends peaks during the day, which indicates a slight increase of activities in the daytime at theresidence on weekends.In the left column of Figure 8 we compare the cumulativedistributional differences between weekdays and weekendswith respect to total electricity usage. From this figure weobserve that weekend electricity consumption is slightly moreconcentrated in the top clusters (i.e., the weekend cumulativedistribution is pulled slightly to the left of the weekday distri-bution), the entropy (right column of Figure 8) suggests usagepatterns on weekdays are relatively more diverse and variablethan weekends. However, these differences are extremely small(difference in entropy is less than 0.07). These results suggestthat at population level, the diversity of discretionary consump-tion schedules is not significantly informed by whether thatusage is taking place on a weekend or a weekday.
2) Seasonal distributions: according to the top 6 dictionaryload shapes based on kWh usage coverage for each season(Figure 7b), Cluster ,which may be less variable across households than other typesof usage.
3) Distribution by outdoor temperatures within the summerseason: seasonal differences presented above indicate thatthis summer peaking utility has the least variable patternsduring the peak demand season with significant cooling needs.Household energy consumption patterns in response to outdoortemperature itself is important to understand from the perspec- http://esnews.wapa.gov/wordpress/tag/sacramento-municipal-utility-district/ Fig. 6. Dictionary load shapes categorized by peak timing (horizontal axis) and number of daily peaks (vertical axis) tive of maintaining sufficient electricity generation capacity,grid reliability, and transmission and distribution infrastructureto meet demand. The most important factor is the demandon the small number of highest consumption days. Highestelectricity consumption tends to occur on the hottest weekdays.We define daily average outdoor dry bulb temperaturequartiles ( <
68 F, 68-71F, 71-76F, and > C. Load shape variability within and cross customers
While overall consumption patterns are least diverse in thesummer season, the variability in load shapes differ consid-erably among customers during this same period (Figure 9).Entropy of individual customers ranges from 0 to 3, muchgreater than the overall difference driven by external factorssuch as day of week, season, or temperature, suggestinginternal factors specific to individual households are moreimportant to understand the behaviors that generate variousdegrees of load shape diversity.We assess the correlation between load shape entropy ofindividual households and household characteristics to un-derstand the key predictors of variability in consumptionschedules. Entropy differences between households with andwithout the selected household characteristics are shown inFigure 10. We can see that having an electric dryer andchildren-in-home are the leading factors associated with morevariable consumption schedules. This may be associated withvariable day-to-day laundry time and more chaotic needs ofchildren-in-home. Low-income households, those with centralAC, and full-time workers also tend to vary their pattern
Fig. 7. Top 6 dictionary load shapes and their occurrence frequency. (a)weekday vs weekend; (b) by season; and (c) by outdoor temperature insummer. of consumption from day to day. On the other end of thespectrum, elderly households tend to have the most stable day-to-day routines. Characteristics of single family homes, thoseworking from home, and those with a college degree also tendto correlate with more stable consumption schedules relativeto households without those characteristics.Depending on the household characteristics, we can see notall the resulting variability is likely to be readily translated toconsumption flexibility. Usage of appliances such as electricdryers can be flexibly moved between times of day to accom-modate a time-based DR program. However, the variability ofload shapes due to having children-in-home reflects a lifestyleconstraint that may be more difficult to change.[34] identified literature gaps with regard to consumptionbehavior and DR response potential of vulnerable and low-income customers. Our results suggest that in terms of variabil-ity in consumption schedules, households with chronic illnessdo not show significant differences relative to those withoutchronic illness. Low-income households on the other hand tendto be more variable while elderly households are the opposite.Further research is needed to test whether the variability shownin low-income households makes them desirable candidates forDR programs.
Fig. 8. Cumulative electricity usage distribution of electricity usage acrossclusters (left column) and load shape entropy (right column) determined fortemporal periods around external factors (day-of-week, season, and differentlevels of summer temperatures).Fig. 9. Distribution of summer load shape entropy of individual customers.
As discussed in the section IV B, there are three dictionaryload shapes that cover 50% of all the electricity consumptionduring the hottest summer days (clusters 1, 2 and 4 in Figure7c). To further understand the within and across customer vari-ability we map the occurrence of these three dictionary loadshapes of primary interest across the population of householdsin our data (a random sample of 1000 households are shown inFigure 11). The households are sorted by the overall numberof times one of their days was assigned to one of the threedictionary load shapes of interest. The households farthest tothe bottom of the figure had the fewest days assigned to acluster of interest, while those at the top had the most daysassigned to a cluster of interest. The average daily outdoor Fig. 10. Difference in load shape entropy between households with andwithout the respective characteristic (error bars indicate 95% confidenceintervals).Fig. 11. Occurrence of dictionary load shapes 1, 2 and 4 across householdsand relative to daily outdoor average temperature and load shape entropy. temperature and load shape entropy computed for each dayare overlaid onto the bottom of this figure for reference.In examining this figure, first the correlation of the clustersof interest with temperature is clear with counts of cluster-of-interest membership, i.e. column averages, increasing astemperatures spike. The diversity of usage patterns measuredby load shape Entropy also shows sharp decreases whentemperatures increase. Second, the variability in load shapes(whether in the top 3 dictionary patterns or not) can beexplained by the responsiveness of the households to outsidetemperature, which is consistent with [30]. We see that thehouseholds in the top quarter of the figure are classified into one of the three target dictionary load shapes on a largepercentage of their summer days, while the households inthe bottom quarter of the figure only demonstrate these targetusage patterns when temperatures spike.These two types of behavior, and the spectrum acrosshouseholds in between, represent very different underlying be-havioral patterns. There is a need for further research to betterunderstand these underlying patterns. Observable patterns suchas these based on AMI data alone can be used to segmentcustomers into groups that better capture heterogeneity in howthey are likely to respond to DR or time-differentiated pricingprograms. These segments could be used, for example, totarget relevant programs based on the needs of the utility.V. C
ONCLUSION
In this paper we employ an innovative clustering techniqueto categorize daily electricity consumption at hourly resolutionacross a large sample of residential customers over a fullyear. We focus clustering on the schedules and magnitudes ofdiscretionary consumption with an innovative “de-minning”process. Our clustering procedure results in a dictionary of99 distinctive usage patterns that can represent more than 30million discretionary load shapes within a reasonable errorthreshold.With cluster assignment of daily discretionary load shapesover the whole year period, we are able to demonstrate howconsumption patterns can be differentiated by external influ-encing factors such as time scales of interests (season and day-of-week) and meteorological conditions (outside temperaturelevels). Analysis of the temporal distribution of 99 dictionaryload shapes reveals that high temperature is the single biggestexternal influence reducing discretionary load pattern diversity.In particular, approximately 50% of the energy use during thehottest days are covered by only three dictionary load shapes.The coincident load resulting from increased concentrationof usage patterns driven by high temperatures is problematicfor the grid, causing high system peaks that are expensiveand threaten service stability. Variation in the concentrationof discretionary usage patterns exists across seasons as well,but this is largely being driven by temperature. There is muchless variations in the distribution of electricity usage acrossdictionary load shapes between weekends and weekdays thanacross seasons.There is significant diversity of load shapes within house-holds across days and such variability can be explained byhousehold characteristics including socio-demographic andlifestyle information, dwelling information, and applianceownership information. We find that having an electric dryerand children-in-home are the leading predictors of a morevariable consumption schedule, while homes with elderlyresidents best predict stable day-to-day routines. Among thevulnerability characteristics considered here (chronic-illness,elderly, and low-income), we find low-income households tendto be more variable. This needs further research to confirmwhether such variability can lead to greater DR potential.We have demonstrated that there is significant heterogeneityacross households regarding the diversity in usage patterns and such diversity markedly decreases on hotter days. Thevariability in summer load shapes across customers can beexplained by the responsiveness of the households to outsidetemperature, which is consistent with current literature. Ourresults motivate future work in which identifying and mappingthese particular patterns across the population can potentiallybe used in developing targeting techniques to improve theuptake and effectiveness of demand response programs.While utilities and system operators typically focus onaggregate residential load shapes, our findings shed light onthe considerable heterogeneity and the relative importance ofinfluencing factors that underlying such variability across daysand households. We argue that finding tractable ways to mapout and understand this variability, as we have begun above,can be a powerful tool for subsequently segmenting customersfor better program targeting and tailoring to meet the needs ofthe rapidly evolving electricity grid.A CKNOWLEDGMENT
The work described in this report was funded by LaboratoryDirected Research and Development (LDRD) funds from theU.S. Department of Energy under Contract No. DE-AC02-05CH11231. R
EFERENCES[1] T. R¨as¨anen, D. Voukantsis, H. Niska, K. Karatzas, and M. Kolehmainen,“Data-based method for creating electricity use load profiles using largeamount of customer-specific hourly measured electricity use data,”
Appl.Energy , vol. 87, no. 11, pp. 3538–3545, Nov. 2010.[2] C. Flath, D. Nicolay, T. Conte, C. van Dinther, and L. Filipova-Neumann,“Cluster analysis of smart metering data,”
Bus Inf Syst Eng , vol. 4, no. 1,pp. 31–39, Feb. 2012.[3] H. Cao, C. Beckel, and T. Staake, “Are domestic load profiles stableover time? an attempt to identify target households for demand sidemanagement campaigns,” in
IECON 2013 - 39th Annual Conference ofthe IEEE Industrial Electronics Society . ieeexplore.ieee.org, Nov. 2013,pp. 4733–4738.[4] F. McLoughlin, A. Duffy, and M. Conlon, “A clustering approach todomestic electricity load profile characterisation using smart meteringdata,”
Appl. Energy , vol. 141, pp. 190–199, Mar. 2015.[5] I. Khan, M. W. Jack, and J. Stephenson, “Identifying residential dailyelectricity-use profiles through time-segmented regression analysis,”
Energy Build. , vol. 194, pp. 232–246, Jul. 2019.[6] J. Kwac, J. Flora, and R. Rajagopal, “Household energy consumptionsegmentation using hourly data,”
IEEE Trans. Smart Grid , vol. 5, no. 1,pp. 420–430, Jan. 2014.[7] S. Haben, C. Singleton, and P. Grindrod, “Analysis and clustering ofresidential customers energy behavioral demand using smart meter data,”
IEEE Trans. Smart Grid , vol. 7, no. 1, pp. 136–144, Jan. 2016.[8] Yi Wang, Qixin Chen, Chongqing Kang, Mingming Zhang, Ke Wang,and Yun Zhao, “Load profiling and its application to demand response:A review,”
Tsinghua Sci. Technol. , vol. 20, no. 2, pp. 117–129, Apr.2015.[9] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of smart meterdata analytics: Applications, methodologies, and challenges,”
IEEETransactions on Smart Grid , Feb. 2018.[10] G. Chicco, “Overview and performance assessment of the clusteringmethods for electrical load pattern grouping,”
Energy , vol. 42, no. 1,pp. 68–80, Jun. 2012.[11] K. Moslehi and R. Kumar, “A reliability perspective of the smart grid,”
IEEE Trans. Smart Grid , 2010.[12] H. Farhangi, “The path of the smart grid,”
IEEE Power Energ. Mag. ,vol. 8, no. 1, pp. 18–28, Jan. 2010.[13] W.-C. Hong, “Electric load forecasting by seasonal recurrent SVR(support vector regression) with chaotic artificial bee colony algorithm,”
Energy , vol. 36, no. 9, pp. 5568–5578, Sep. 2011. [14] K.-L. Zhou, S.-L. Yang, and C. Shen, “A review of electric loadclassification in smart grid environment,”
Renewable Sustainable EnergyRev. , vol. 24, pp. 103–110, Aug. 2013.[15] S. Yilmaz, J. Chambers, and M. K. Patel, “Comparison of clusteringapproaches for domestic electricity load profile characterisation - impli-cations for demand side management,”
Energy , vol. 180, pp. 665–677,Aug. 2019.[16] M. E. Dyson, S. D. Borgeson, M. D. Tabone, and D. S. Callaway, “Usingsmart meter data to estimate demand response potential, with applicationto solar energy integration,”
Energy Policy , vol. 73, pp. 607–619, 2014.[17] I. Dent, T. Craig, U. Aickelin, and T. Rodden, “Variability of behaviourin electricity load profile clustering; who does things at the same timeeach day?” in
Industrial Conference on Data Mining . Springer, 2014,pp. 70–84.[18] C. Public Utilities Commission, “Order instituting rulemaking on thecommission’s own motion to consider alternative-fueled vehicle tariffs,infrastructure and policies to support . . . ,”
California Public UtilitiesCommission , 2009.[19] L. Jin, D. Lee, A. Sim, S. Borgeson, K. Wu, C. Anna Spurlock, andA. Todd, “Comparison of clustering techniques for residential energybehavior using smart meter data,” in
Workshops at the Thirty-First AAAIConference on Artificial Intelligence , Mar. 2017.[20] I. Dent, T. Craig, U. Aickelin, and T. Rodden, “An approach for assessingclustering of households by electricity usage,” Jan. 2012.[21] G. W. Milligan and M. C. Cooper, “A study of standardization ofvariables in cluster analysis,”
J. Classification , vol. 5, no. 2, pp. 181–204,Sep. 1988.[22] G. Chicco, R. Napoli, and F. Piglione, “Comparisons among clusteringtechniques for electricity customer classification,”
IEEE Trans. PowerSyst. , vol. 21, no. 2, pp. 933–940, May 2006.[23] M. Piao, H. S. Shon, J. Y. Lee, and K. H. Ryu, “Subspace projectionmethod based clustering analysis in load profiling,”
IEEE Trans. PowerSyst. , vol. 29, no. 6, pp. 2628–2635, Nov. 2014.[24] J. Han, J. Pei, and M. Kamber,
Data Mining: Concepts and Techniques .Elsevier, Jun. 2011.[25] J. Torriti, R. Hanna, B. Anderson, G. Yeboah, and A. Druckman, “Peakresidential electricity demand and social practices: Deriving flexibilityand greenhouse gas intensities from time use and locational data,”
Indoorand Built Environment , vol. 24, no. 7, pp. 891–912, 2015.[26] A. Satre-Meloy, M. Diakonova, and P. Gr¨unewald, “Daily life anddemand: an analysis of intra-day variations in residential electricityconsumption with time-use data,”
Energy Efficiency , pp. 1–26, 2019.[27] M. B. Roberts, N. Haghdadi, A. Bruce, and I. MacGill, “Characterisationof australian apartment electricity demand and its implications for low-carbon cities,”
Energy , May 2019.[28] S. Xu, E. Barbour, and M. C. Gonz´alez, “Household segmentation byload shape and daily consumption,” in
Proc. ACM SigKDD 2017 Conf.Halifax, Nov. Scotia, Canada, August 2017 . urbcomp.ist.psu.edu, 2017.[29] D. Zhou, M. Balandat, and C. Tomlin, “Residential demand responsetargeting using machine learning with observational data,” Jul. 2016.[30] H. Fang, Y. Zhang, M. Liu, and W. Shen, “Clustering and analysis ofhousehold power load based on HMM and multi-factors,” in . ieeexplore.ieee.org, May 2018, pp. 491–495.[31] K. Chen, J. Hu, and Z. He, “Data-driven residential customer aggregationbased on seasonal behavioral patterns,” in . ieeexplore.ieee.org, Jul. 2017, pp. 1–5.[32] F. N. Melzi, A. Same, M. H. Zayani, and L. Oukhellou, “A dedicatedmixture model for clustering smart meter data: Identification and anal-ysis of electricity consumption behaviors,”
Energies , vol. 10, no. 10, p.1446, Sep. 2017.[33] J. D. Rhodes, W. J. Cole, C. R. Upshaw, T. F. Edgar, and M. E. Webber,“Clustering analysis of residential electricity demand profiles,”
Appl.Energy , vol. 135, pp. 461–471, Dec. 2014.[34] F. Economics and S. First, “Demand side response in the domesticsector-a literature review of major trials,”
Final Report, London, August ,2012.[35] P. Cappers, C. A. Spurlock, A. Todd, and L. Jin, “Are vulnerablecustomers any different than their peers when exposed to critical peakpricing: Evidence from the us,”
Energy policy , vol. 123, pp. 421–432,2018.[36] L. Jin, A. Spurlock, S. Borgeson, D. Fredman, L. Hans, S. Patel, andA. Todd, “Load shape clustering using residential smart meter data: atechnical memorandum,” 2016.[37] D. L. Davies and D. W. Bouldin, “A cluster separation measure,”