Variability of Behaviour in Electricity Load Profile Clustering; Who Does Things at the Same Time Each Day?
VVariability of Behaviour in Electricity LoadProfile Clustering; Who Does Things at theSame Time Each Day?
Ian Dent , Tony Craig , Uwe Aickelin , and Tom Rodden School of Computer Science, University of Nottingham, Nottingham NG8 1BB, UK, [email protected] ,WWW home page: http://ima.ac.uk/dent The James Hutton Institute, Aberdeen, UK
Abstract.
UK electricity market changes provide opportunities to alterhouseholds’ electricity usage patterns for the benefit of the overall elec-tricity network. Work on clustering similar households has concentratedon daily load profiles and the variability in regular household behaviourshas not been considered. Those households with most variability in reg-ular activities may be the most receptive to incentives to change timing.Whether using the variability of regular behaviour allows the creationof more consistent groupings of households is investigated and comparedwith daily load profile clustering. 204 UK households are analysed to findrepeating patterns (motifs). Variability in the time of the motif is usedas the basis for clustering households. Different clustering algorithms areassessed by the consistency of the results.Findings show that variability of behaviour, using motifs, provides moreconsistent groupings of households across different clustering algorithmsand allows for more efficient targeting of behaviour change interventions.
The electricity market in the UK is undergoing dramatic changes. Legal, socialand political drivers for a more carbon efficient electricity network, along withthe dramatically increased flow of data from households through the deploymentof smart meters, requires a transformation of existing practices. In particular, thechange of the frequency of sampling of electricity usage, by using smart meters,alters the level of understanding of households’ behaviour that is possible [1].One approach to address the pressures on the electricity network is the ap-plication of Demand Side Management (DSM) techniques to achieve changesin consumer behaviour. DSM is defined as “systematic utility and governmentactivities designed to change the amount and/or timing of the customer’s useof electricity” for the collective benefit of society, the utility company, and itscustomers [2]. The peak time for electricity usage in the UK is during the earlyevening and the successful application of techniques to reduce, or move, the peakusage would improve the overall efficiency of the electricity network. a r X i v : . [ c s . L G ] S e p o allow selection of appropriate DSM interventions, a good understandingof the existing behaviour of households is needed. Firstly, knowledge is neededon an individual household that can be deduced from house-wide electricitymetering. Secondly, a method is required to group large numbers of householdsinto a manageable number of archetypal groups where the members displaysimilar characteristics. This approach allows for cost effective targeting of themost appropriate subset of customers whilst allowing the company managementto deal with a manageable number of archetypes [3].There is an extensive body of work on clustering households which includescomparing or combining timed meter readings to create additional attributesthat contribute to the quality of the clustering [4]. However, little work hasfocused on how the daily activity patterns of the household vary from day today and how this can be used for clustering. For instance, some households willbe creatures of habit and will eat their evening meal at almost the same timeeach evening, whilst others have a much more variable activity pattern and willeat at different times. Elleg ˙ard and Palm [5] have investigated the variability ofbehaviour using diaries and interviews but have not used analysis of meter data.Clustering households using their degree of variability in behaviour, as shownby electricity consumption, provides a way of identifying the subset of electricityusers who may be most receptive to an intervention to influence their activitypatterns. The intervention may be to reward households for NOT changing theircurrent pattern of usage if it is already as desired by the utility company.This paper addresses the question of whether making use of the variability ofbehaviour (as shown by the electricity meter data) provides “better” groupingsof households for the purpose of DSM than those provided by using daily loadprofiles. The judgement of “better” is measured by implementing a number ofdifferent clustering techniques and measuring the degree of overlap between theclusters found. A consistent set of clusters across the different clustering algo-rithms implies a better, and more useful, approach to generating the clusters.The investigation of household electricity load profiles is an important area ofresearch given the centrality of such patterns in directly addressing the needs ofthe electricity industry, both now and in the future. This work extends existingload profile work by taking electricity meter data streams and developing newways of representing the household that can be used as the basis for clusteringusing existing data mining techniques. The identification of repeating motifs andthe investigation of how the timing of the motifs varies from day to day, as a keybehavioural trait of the household, is a novel area of research. An improvement increating useful archetypes can have major financial and environmental benefits. There has been extensive research on determining daily load profiles to representa household’s electricity usage [6]. In many cases, (e.g., [7]), the daily load profilesre used as the basis for clustering “similar” households together to develop asmall set of archetypal profiles which can be used for targeting of behaviourchange interventions. Previous work has used different clustering techniques withthe majority of the published literature using hierarchical clustering.The common approach is to define a subset of the data (e.g. by season and/orby day of the week) and then to create average daily profiles for a household fromthe electricity meter data. The shapes of these daily profiles are then clusteredto group similar shapes together. A representative profile is defined (e.g. byaveraging all the members of the cluster) to produce a archetypal daily loadprofile for that cluster of households.Previous work has not investigated how households may exhibit differentbehaviour from day to day and how these differences may be used as a distin-guishing feature of the household and a basis for clustering.
The electricity meter data reading stream from a household can be plotted asa graph of usage against time and regular activities appear as similar shapedpatterns. Short patterns that repeat are defined as “motifs” and detection ofthese motifs, and their timing, can inform understanding of household behaviour.This work uses the SAX (Symbolic Aggregate approXimation) techniquewhich allows symbolic representation of time series data [8,9]. Other motif findingalgorithms could also be incorporated into the proposed approach to identify theflexibility of behaviour (e.g. [10]). To assess variability within a household, it isnecessary to detect the repeating motifs that are assumed to signify particularactivities (e.g., cooking the evening meal). These are generally of a similar shapeon different days but show some differences due to noise caused by other activitieswithin the household (e.g., a fridge automatically running). The SAX approach ofsymbolising the real valued meter readings is useful as it allows for approximatematching (as various ranges of readings map to a single symbol).Lines et al [11] applies motif finding to UK data to detect the use of particularappliances, drawn from a set of known appliances. This contrasts with the focusin this paper which is to find interesting, repeating patterns of behaviour withoutthe need to define the activity that the motif represents. Appliances that canbe consistently and accurately detected can be used with the approach detailedhere by extending the analysis of the timing of repeating motifs to the analysisof variability of timing of appliance usage.
This study makes use of data collected as part of the ongoing NESEMP whichis examining the relationship between different types of energy feedback andpsycho-social measures including individual environmental attitudes, householdcharacteristics, and everyday behaviours. As part of this ongoing project, severalhundred households are being monitored and the electricity usage is recordedevery five minutes using CurrentCost monitors [12].fter removing data for households with insufficient readings, the data isloaded into a MySQL database and the readings are aligned with exact 5 minuteboundaries (e.g. 1pm, 1.05pm, etc.) by interpolation between the actual readings.This is achieved by calculating the reading at an exact 5 minute point (e.g.1.05pm) by considering the actual readings before and after that time and bycalculating the reading such that the total usage over a longer period is the samewhether the interpolated readings or the original actual readings are used [13].This results in a set of 288 readings (one for every 5 minute period in the day)for each of the households in the database.Each day of sampling is labelled in a number of ways such as “working day”or “summer” to aid selection of particular subsets of data.
To find motifs within the data, each period of interest within the day (e.g.,the peak period) for each household is examined by taking a moving windowover the period. The subset of the meter readings within the moving windowis then converted into a string and stored. Next, the window moves on by onetime period (5 minutes) and the conversion into a string is repeated. Using analphabet size of 5 and a motif size of 6 (i.e., 30 minutes), analysing the 4pmto 8pm period provides a total of 49 x 5 minute readings for each day. As theinterest is in changes in usage rather than absolute usage, these readings arecompared with adjacent readings in time to produce 48 values (one for each 5minute period) representing the change in usage since the last 5 minute reading.This results in 42 motifs stored for each day for each household (one for eachpossible 30 minute period within the peak time). Fig. 1 shows an example of howthe symbolised motifs are built up. The top graph shows the 5 minute readingsfor the 4 hour peak period. A sliding window of 6 readings (30 minutes) istaken across the peak period with the first 2 and the last window shown. Eachwindow is normalised within the values in the window and then translated intothe symbolised representations as shown at the bottom of the diagram.The analysis uses an alphabet of 5 symbols (i.e., the letters “a” to “e”) torepresent the motifs. 5 is selected as a reasonable compromise between havingtoo few symbols, and thus not detecting changes in electricity consumption, andhaving too many and thus generating too many patterns that do not repeat. Thesymbolisation translates readings within a particular range into a given letterand thus similar, although not identical, readings are translated into the sameletter. The resulting motifs for 2 windows may be identical whereas the originalreadings may only be approximately similar.The motif size selected is 6 corresponding to a 30 minute (i.e., 6 x 5 minutes)period. This figure was selected as the UK electricity settlement market uses a30 minute period [14] and 30 minutes is also a reasonable period that will allowtime for activities such as showering.The motifs are built from the graph shape without regard to absolute valueof the data. A possible effect of this is to find motifs within what is the general ig. 1.
Example of symbolisation (alphabet of 5, motif length of 6) noise associated with the meter readings. This is avoided by ignoring any motifswithin a window which have a range of less than 100W.As the motifs are created by shifting a moving window over the stream ofdata, overlapping periods are considered and periods with no activity except forone change in meter reading will lead to a series of motifs that are similar. Forexample, a long period of no activity except for a jump of +200W will lead tomotifs being found such as ccccca, ccccac, cccacc, etc. As only one of these isinteresting for further analysis, the others are excluded.The top motif (the one that occurs most often within a household) is furtherexamined for the times when the motif occurs on each day. The number oftimes the motif occurs, and the standard deviation of the time of occurrence,are calculated for each household. Similarly, the second and third most commonmotifs within a household are identified and the variability in timing calculated.Other useful measures relating to the motifs found within a household arealso calculated including the number of different motifs (occurring at least twice)and the number of different motifs occurring on at least 30% of the days sampledfor the household. The 30% figure is selected as a reasonable number to ensureonly regularly repeating patterns are considered.The attributes calculated for each household and used as input to the clus-tering algorithms are:. Number of occurrences of the motif occurring most frequently during thepeak period.2. Variability in timing of the occurrence of the most frequent motif withinthe household. This is represented by the standard deviation of the timing(measured in minutes) around the mean start time.3. Number of occurrences of the second most frequent motif.4. Variability in timing of the second most frequent motif.5. Number of occurrences of the third most frequent motif.6. Variability in timing of the third most frequent motif.7. Total number of motifs for the household that occur at least twice.8. Total number of different motifs that occur on at least 30% of days.
Various clustering techniques are selected for evaluation of the different ap-proaches to analysing the data. Note that, whilst possibly a useful additionalbenefit, this work does not focus on selecting the “best” clustering algorithmbut uses a selection of algorithms to assess the benefits or otherwise of makinguse of the motif variability information.Based on the review by Chicco [6] the following clustering algorithms areselected as the most commonly used in previous work:1. Kmeans is a well known algorithm that occurs in a number of examples ofprevious load profiling work. The algorithm requires a number of clusters (k)and works by randomly selecting an initial k locations for the centres of theclusters. Each data point is then assigned to one of the clusters by selectingthe centre nearest to that data point. Once all the data points are assigned,each collection of points is considered, the new centre of the allocated pointsis calculated and the centre for that cluster is reassigned. The points are thenreallocated to their new nearest centre and the algorithm continues until nochanges are made to the allocations of points for an iteration [15].2. Fuzzy c means. This provides an extension of the kmeans algorithm allowingpartial membership to more than one cluster. The algorithm provides addi-tional output showing the degree of membership that each household has ofeach of the derived clusters [16]. For this analysis, each household is assignedto the cluster for which they have the highest degree of membership.3. Self Organising Maps. The Self Organising Map (SOM) is a neural networkalgorithm that can be used to map a high dimension set of data into a lowerdimension representation. In this paper, the mapping is to a 2 dimensional setof representations which are arranged in a hexagonal map. Each sample (e.g.,the average load profile for a given household) is assigned to a position in themap depending on the closeness of the sample to the existing nodes assignedto each position in the map (using a Euclidean measure of distance). Initiallythe nodes are assigned at random but, over time, the map produces anarrangement where similar samples are placed closely together and dissimilarsamples are placed far apart [17].. Hierarchical clustering. Most of the published load profiling work has usedhierarchical clustering and this approach has the benefit of providing easilyunderstood rules for cluster membership. The algorithm uses a dissimilaritymatrix for the households and, starting initially with each household in itsown cluster, proceeds by joining clusters which are most similar. The hier-archy is cut at a point to provide the desired number of clusters [18]. TheEuclidean distance is used when creating the dissimilarity matrix and theWard agglomeration method [19] is used for combining clusters. The Wardmethod minimises the sum of squares of possible clusters when selectinghouseholds to combine. Other agglomeration techniques tend to create a fewsmall clusters containing extreme valued households plus one large clustercontaining the remainder of the households.5. Random Forests [20] is used to create a dissimilarity matrix which is usedwith Partitioning Around Medoids (pam) to form clusters. This is imple-mented using the R package randomForest [21].A common issue is the appropriate setting for the number of clusters. Tomatch common practice within the electricity industry, 8 clusters are selected.The UK electricity industry has worked with 8 load profiles since the 1990s[22]. Figueiredo et al [23] report that the Portuguese electricity utility aim for anumber of clusters between 6 and 9.
To assess the benefits of a particular cluster solution an appropriate clustervalidity index needs to be used. Many have been considered in the literaturewith the Mean Index Adequacy (MIA) and the Cluster Dispersion Indicator(CDI) [24] used in most of the published load profiling work. Lower values forthe CDI and MIA measure denote “better” solutions.The data to be clustered consists of M records numbered as m = 1 , ..M .Each record has H attributes numbered as h = 1 , ..H . The h th attribute for the i th record is designated as m i ( h ).The data is clustered into K clusters (numbered as k = 1 , .., K ). Each clusterhas R k members where r ( k ) is the r th record assigned to cluster k and C ( k ) isthe calculated centre of the cluster k .The distance ( d ) between 2 records is defined as: d ( m i , m j ) = (cid:118)(cid:117)(cid:117)(cid:116) H H (cid:88) h =1 ( m i ( h ) − m j ( h )) (1)where m i ( h ) and m j ( h ) are the h th attributes for two records, m i and m j .The “within set distance” ˆ d ( S ) of the members of a set, S with N members( s j where j = 1 , .., N ) is defined as:ˆ d ( S ) = (cid:118)(cid:117)(cid:117)(cid:116) N N (cid:88) n =1 N (cid:88) p =1 d ( s n , s p ) (2)he MIA gives a value which relies on the amount by which each cluster iscompact - i.e., if the members in the cluster are close together the MIA is low. M IA = (cid:118)(cid:117)(cid:117)(cid:116) K K (cid:88) k =1 (cid:88) r d ( r ( k ) , C ( k ) ) (3)The CDI depends on the distance between the members of the same cluster(as for the MIA) but also incorporates information on the distances between therepresentative load diagrams (i.e., the centroids) for each cluster. This thereforemeasures both the compactness of the clusters and the amount by which eachcluster differs from the others. CDI = 1ˆ d ( C ) (cid:118)(cid:117)(cid:117)(cid:116) K K (cid:88) k =1 ˆ d ( R k ) (4)where C is the set of cluster centres and R k is the k th cluster members set. UK specific data is used to generate average daily load profiles for each householdwhich are clustered to provide a baseline for comparison. Selected clusteringalgorithms are applied to the data and validity indexes are used to produce ameasure of the quality of the partitions found.Next, the novel approach of identifying motifs within the data, and measuringthe variability in timing of the motifs, is used to generate a new set of deriveddata using the same UK dataset. The same clustering algorithms and validityindexes are then applied to this dataset. In addition, the results are comparedwith the baseline obtained from the average daily load profiles in the first step.
To assess the consistency of clustering solutions, the different arrangements ofhouseholds into clusters are compared. The consistency of the clusters obtainedfrom the different clustering algorithms is used as a measure of the quality ofthe results with more consistency between the results suggesting a more usefulmethod of identifying the clusters.Measuring consistency across the clustering results using the different sets ofdata (load profiles and motifs) may be criticised as not necessarily providing atrue measure of quality as clustering results may be consistent but not necessarilyrepresent useful, “true” clusters within the data.The Rand index compares the different pairs of samples (i.e., each possiblepair of households) and assesses the number in which each pair are in the samepartition in the 2 different clustering solutions, the number where each memberof the pair are in different partitions in both solutions, and the case where themembers are in the same partition in one solution but a different partition inhe other solution. The corrected Rand index [25] builds on the original workbut adjusts the calculated value for the expected matching that would occurin a random arrangement. The corrected Rand index ranges from -1 to 1 witha higher value signifying better agreement between the partitions and hence abetter solution.
A subset of the data is extracted for the peak period of 4pm to 8pm and for work-ing days from Spring (March, April and May) 2011. Working days are weekdaysexcluding Scottish public holidays. Not all households have a full set of meterreadings and those with less than 4 days of valid readings are excluded. Thedataset has around 440,000 individual meter readings from 204 households.The activities of interest within a household are related to switching appli-ances on or off (e.g., the use of electrical appliances in cooking) and it is thechanges in the readings, rather than the absolute readings, that are of mostinterest and are used as the basis for analysis when using motifs.
The data for the evening peak period (4pm to 7.55pm) are averaged to create arepresentative load profile for each household. For example, all the readings for4pm for the household are averaged to create a representative reading for 4pm,similarly for 4:05pm, etc. The 204 representative profiles, each with 48 attributes(one for each time point), are then normalised within the 0 to 1 range and usedwith a variety of clustering algorithms.
Various different measures of variability of behaviour within the household canbe defined without the use of motifs (e.g., [26]) and two methods are considered.One approach is to consider the time at which the maximum usage occurredon each day during the period of analysis. These times are then used to calculatethe standard deviation of the time around the mean for each household.A second approach is to consider the total usage during the peak period oneach day during Spring 2011. The standard deviation of the total per day aroundthe mean total per day also provides a measure of variability of behaviour. Eachof the 2 measures are calculated and used as the basis of simple clustering usingkmeans ( k = 8). The households in each of the clusters are shown in Fig. 2.There is little correspondence between the cluster assignments for the 2 meth-ods. The corrected Rand index of 0.01 shows no correspondence beyond that ex-pected by chance. Furthermore, there is little correspondence with the clustersobtained from the motif variability approach (detailed below). A Spearman’s rho ig. 2. Results for alternative variability measures value of 0.23 shows there is little correlation between the two different variabilitymeasures.It is therefore concluded that neither of the non-motif measures give a useful,consistent measure of the variability of each household.
This paper finds the motifs in the stream of meter data and then examines howthe times of these repeating patterns vary from day to day within a household.Furthermore, the number of times a pattern repeats within a household is alsoused as an indication of the variability of behaviour of that household.The motifs in the data are discovered and the attributes detailed in Section2.4 are generated. The same clustering algorithms as used for the load profileclustering are then applied to produce 8 archetypal clusters. .5 Results
Various measures that represent the variability of behaviour can be constructedand this paper considers the variability in time of maximum usage and the vari-ability in total usage. However, as each measure is intended to represent the samething (i.e., the variability of behaviour), the fact that there is little correlationbetween the measures, or the membership of the clusters generated using themeasures, means that they provide a poor representation of the characteristic.Comparing the load profile results with the motif variability results, Table 1shows, for each of the clustering algorithms used and for each set of data, thesizes of the partitions in the solution and the values for the MIA and CDI clustervalidity indexes (lower is better).
Table 1.
Clustering Results and Validity indexes
Load Profiles Motifs
Cluster sizes MIA CDI Cluster sizes MIA CDIKmeans 10,16,19,20,20,27,44,48 0.593 1.34 2,5,7,26,29,37,41,57 0.445 0.641Fuzzy 14,17,20,23,23,23,35,49 0.679 2.14 12,15,19,26,28,30,34,40 0.551 2.084SOM 13,15,16,20,28,31,36,45 0.595 1.337 2,5,24,25,28,29,40,51 0.451 0.733Hier 9,10,13,20,22,37,43,50 0.61 1.386 2,3,5,26,31,34,40,63 0.46 0.64RF 14,18,19,28,29,29,30,37 0.794 1.131 18,18,19,21,25,32,35,36 0.628 1.34
The MIA and CDI values show that the kmeans and SOM techniques producesimilar quality solutions using the load profiles. The hierarchical algorithm is lessgood with the Fuzzy Cmeans algorithm being significantly poorer. The randomforest and pam combination provides a good result for the CDI measure butscores poorly on the compactness of the clusters (as measured by MIA).When using the motif variability data, the kmeans, SOM and hierarchicalalgorithms produce similar quality results with the Fuzzy Cmeans algorithmagain producing poorer results. The random forest and pam combination pro-vides middling results.The MIA and CDI validity index calculations are not comparable betweendatasets due to the different number of attributes used.Table 2 gives information on the consistency of the cluster partitions as theclustering algorithm changes. The results for the Rand index show that the valuesare consistently closer to 1 in the case of the clusters built using motif variation.The mean values for the Rand index (after omission of the values on the diagonal)are 0.4549 for the load profiles and 0.5183 for the motif variability approach.This shows a more consistent set of partitions are created when using the motifvariability than the partitions created using the load profile information. able 2.
Modified Rand index of clusters using different clustering algorithms
Profiles Motifs
Kmeans Fuzzy SOM Hier RF Kmeans Fuzzy SOM Hier RFKmeans 1 0.544 0.629 0.668 0.251 1 0.592 0.794 0.622 0.358Fuzzy 0.544 1 0.562 0.491 0.355 0.592 1 0.626 0.511 0.447SOM 0.629 0.562 1 0.49 0.287 0.794 0.626 1 0.591 0.33Hier 0.668 0.491 0.49 1 0.272 0.622 0.511 0.591 1 0.312RF 0.251 0.355 0.287 0.272 1 0.358 0.447 0.33 0.312 1
The results from the kmeans algorithm using the motif variability data canbe seen at Fig. 3. The cluster with 26 houses shows very little variability in thetiming of their regular activities and can be assumed to be “creature of habit”households who may not respond well to an incentive to change behaviour. The2 house and the 29 house clusters show lots of repeating activities and may bebest to target for interventions as there are likely to be many activities thatoften repeat and that may be modifiable.
Fig. 3. kmeans clusters using motif variability
Examining the 29 house cluster in more detail, Fig. 4 show the motifs foundfor one of the houses and how the time of occurrence of the motif varies acrossthe 4pm to 8pm period. In contrast, the motifs for one of the houses in the 26house cluster are shown in Fig. 5 and the timings can be seen to be less variable.As a comparison, the average load profiles for each of the households inthe 29 house cluster are shown at Fig. 6. There is little similarity between thehouseholds and hence, using the load profile shapes as the basis, little likelihoodof the households being clustered together. However, the variability in timing ig. 4.
Example house(high variability)
Fig. 5.
Example house (lowvariability)
Fig. 6.
Load profiles forhigh variability cluster of the motifs can be used as a method for selecting appropriate households totarget and allows groupings to be designated as high or low variability.
The ability to cost effectively partition domestic households into a few meaning-ful archetypes based on the household electricity usage is an important problemfor the electricity industry. Identifying a few archetypal representations of house-holds is essential for cost effective implementation of DSM techniques which itselfis necessary to allow the electricity industry to meet the upcoming challenges.Producing more consistent and more descriptive archetypes than currently pos-sible will allow the deployment of effective behaviour modification interventions.Previous work does not incorporate any measure of the variability of regularbehaviour when clustering households. The variability is an important character-istic as one of the major uses of the results is to target incentives for householdsto vary their behaviour to provide benefit to the electricity network.The results presented show that the “variability in timing of motifs” ap-proach produces more consistent clusters across different clustering algorithmscompared to the consistency of clustering using just the daily load profiles.The symbolisation technique is effective in detecting repeating patterns (mo-tifs) that are approximately the same shape. Depending on the type of interven-tion planned for a subset of the households (for example, incentives to changeoverall electricity usage from day to night, or to influence short periods of usageduring the peak period), different sizes of motifs may be used.This work shows a novel approach to using electricity meter data to clusterhouseholds that enhances and complements the existing techniques based on thedaily load profiles. cknowledgements
This work was possible thanks to RCUK Energy Programme and EPSRC grantreferences EP/I000496/1 and EP/G065802/1 and forms part of the Desimaxproject [27].Thanks are due to Pavel Senin for providing R code implementing the SAXmethod.
References
1. DECC: Towards a Smarter Future, Government Response to the Consultation onElectricity and Gas Smart Metering. (2009)2. River: Primer on demand-side management with an emphasis on price-responsiveprograms. prepared for The World Bank by Charles River Associates, Tech. Rep(2005)3. Mooi, E., Sarstedt, M.: A concise guide to market research: The process, data, andmethods using IBM SPSS statistics. Springer (2011)4. Ramos, S., Figueiredo, V., Rodrigues, F., Pinhero, R., Vale, Z.: Knowledge ex-traction from medium voltage load diagrams to support the definition of electricaltariffs. International Journal of Engineering Intelligent Systems for Electrical En-gineering and Communications (3) (2007) 1435. Elleg˚ard, K., Palm, J.: Visualizing energy consumption activities as a tool formaking everyday life more sustainable. Applied Energy (5) (2011) 1920–19266. Chicco, G.: Overview and performance assessment of the clustering methods forelectrical load pattern grouping. Energy Volume 42, Issue 1 (June 2012) 68807. Ramos, S., Duarte, J., Soares, J., Vale, Z., Duarte, F.: Typical load profiles inthe smart grid context - A clustering methods comparison. In: Power and EnergySociety General Meeting, IEEE (2012) 1–88. Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolicrepresentation of time series. Data Mining and Knowledge Discovery (2) (2007)107–1449. Shieh, J., Keogh, E.: i SAX: indexing and mining terabyte sized time series. In:Proceeding of the 14th ACM SIGKDD international conference on Knowledgediscovery and data mining, ACM (2008) 623–63110. Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of timeseries motifs. In: Proc. of 2009 SIAM International Conference on Data Mining.(2009) 1–1211. Lines, J., Bagnall, A., Caiger-Smith, P., Anderson, S.: Classification of householddevices by electricity usage profiles. Intelligent Data Engineering and AutomatedLearning (2011) 403–41212. Craig, T., Polhill, J.G., Dent, I., Galan-Diaz, C., Heslop, S.: The North EastScotland Energy Monitoring Project: Exploring relationships between householdoccupants and energy usage. Energy and Buildings (2014)13. Dent, I., Craig, T., Aickelin, U., Rodden, T.: A Method for Cleaning and StoringElecticity Meter Data for Flexible Analysis. In: BeHave 2012, Helsinki. (2012)14. Elexon: The Electricity Trading Arrangements: A Beginners Guide. Technicalreport, Elexon (2012)15. Jain, A., Dubes, R.: Algorithms for clustering data. Number 978-0130222787.Prentice Hall College Div (1988)6. Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. KluwerAcademic Publishers (1981)17. Kohonen, T.: The self-organizing map. Proceedings of the IEEE (9) (2002)1464–148018. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster analysis. Edward Arnold,London (2001)19. Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. Journalof the American statistical association (301) (1963) 236–24420. Breiman, L.: Random forests. Machine learning (1) (2001) 5–3221. Liaw, A., Wiener, M.: Classification and Regression by randomForest. R News (3) (2002) 18–2222. Electricity Association: Load profiles and their use in electricity settlement. UK-ERC (1997)23. Figueiredo, V., Rodrigues, F., Vale, Z., Gouveia, J.: An electric energy consumercharacterization framework based on data mining techniques. Power Systems,IEEE Transactions on (2) (2005) 596–60224. Chicco, G., Napoli, R., Postolache, P., Scutariu, M., Toader, C.: Customer charac-terization options for improving the tariff offer. Power Systems, IEEE Transactionson (1) (2003) 381–38725. Hubert, L., Arabie, P.: Comparing partitions. Journal of classification2