[PDF] Understanding Electricity-Theft Behavior via Multi-Source Data

Abstract

Electricity theft, the behavior that involves users conducting illegal operations on electrical meters to avoid individual electricity bills, is a common phenomenon in the developing countries. Considering its harmfulness to both power grids and the public, several mechanized methods have been developed to automatically recognize electricity-theft behaviors. However, these methods, which mainly assess users' electricity usage records, can be insufficient due to the diversity of theft tactics and the irregularity of user behaviors. In this paper, we propose to recognize electricity-theft behavior via multi-source data. In addition to users' electricity usage records, we analyze user behaviors by means of regional factors (non-technical loss) and climatic factors (temperature) in the corresponding transformer area. By conducting analytical experiments, we unearth several interesting patterns: for instance, electricity thieves are likely to consume much more electrical power than normal users, especially under extremely high or low temperatures. Motivated by these empirical observations, we further design a novel hierarchical framework for identifying electricity thieves. Experimental results based on a real-world dataset demonstrate that our proposed model can achieve the best performance in electricity-theft detection (e.g., at least +3.0% in terms of F0.5) compared with several baselines. Last but not least, our work has been applied by the State Grid of China and used to successfully catch electricity thieves in Hangzhou with a precision of 15% (an improvement form 0% attained by several other models the company employed) during monthly on-site investigation.

Full PDF

UUnderstanding Electricity-Theft Behavior via Multi-Source Data

Wenjie Hu

Zhejiang [email protected]

Yang Yang ∗ Zhejiang [email protected]

Jianbo Wang

State Grid Power Supply Co. [email protected]

Xuanwen Huang

Zhejiang [email protected]

Ziqiang Cheng

Zhejiang [email protected]

ABSTRACT

Electricity theft, the behavior that involves users conducting illegaloperations on electrical meters to avoid individual electricity bills, isa common phenomenon in the developing countries. Considering itsharmfulness to both power grids and the public, several mechanizedmethods have been developed to automatically recognize electricity-theft behaviors. However, these methods, which mainly assess users’electricity usage records, can be insufficient due to the diversity oftheft tactics and the irregularity of user behaviors.In this paper, we propose to recognize electricity-theft behav-ior via multi-source data. In addition to users’ electricity usagerecords, we analyze user behaviors by means of regional factors(non-technical loss) and climatic factors (temperature) in the corre-sponding transformer area. By conducting analytical experiments,we unearth several interesting patterns: for instance, electricitythieves are likely to consume much more electrical power thannormal users, especially under extremely high or low temperatures.Motivated by these empirical observations, we further design anovel hierarchical framework for identifying electricity thieves. Ex-perimental results based on a real-world dataset demonstrate thatour proposed model can achieve the best performance in electricity-theft detection (e.g., at least +3.0% in terms of F0.5) compared withseveral baselines. Last but not least, our work has been applied bythe State Grid of China and used to successfully catch electricitythieves in Hangzhou with a precision of (an improvement from0% attained by several other models the company employed) duringmonthly on-site investigation.

KEYWORDS

User modeling, electricity-theft detection, hierarchical recurrentneural network, power grids

ACM Reference Format:

Wenjie Hu, Yang Yang ∗ , Jianbo Wang, Xuanwen Huang, and Ziqiang Cheng.2020. Understanding Electricity-Theft Behavior via Multi-Source Data. In Proceedings of The Web Conference 2020 (WWW ’20), April 20–24, 2020, Taipei,China.

ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3366423.3380291 ∗ Corresponding author: Yang Yang, [email protected] The source code is published at https://github.com/zjunet/HEBR.This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’20, April 20–24, 2020, Taipei, China © 2020 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-7023-3/20/04.https://doi.org/10.1145/3366423.3380291

Area 1

Electricity theft

Climate coziness

Area 2 hotness

UserTransformerArea

Figure 1:

An example of electricity theft. The power is supplied todifferent transformer areas (area 1, area 2). The climate in area 2 ishotter than that in area 1; thus, the power consumption is higher.In order to avoid high electricity bills, several households in area 2try to pilfer the electricity.

Electrical power is an important national energy resource [4]. Oneof the barriers to the stable provision of electrical power is elec-tricity theft . Formally speaking, electricity theft refers to the illegaloperations by which users unauthorizedly tamper the electricitymeter or wires to reduce or avoid consumption costs. Electricitytheft not only results in unbearable economic losses to the powersuppliers, but also endangers the safety of electricity users, theelectrical systems, and even the public at large. As reported in [31] ,electricity theft and other so-called “non-technical loss” result ina staggering $96 billion in losses globally per year; moreover, ashocking statistic is that in 2012, GDP in India was reported to drop1.5% as a result of electricity theft, while Uttar Pradesh, the mostpopulous state in India, lost 36% of its total electric power to theftof this kind [30].Great efforts have been made to detect and prevent electricitytheft. The most intuitive way of doing this is to utilize hardware-driven methods: in short, to find out how the thieves are pilferingthe power, then design and upgrade the meter structures accord-ingly [11, 15, 19]. For instance, Guo et al. [19] surveys and summa-rizes the most commonly used electricity pilfering methods, whichinclude changing the structure or wiring mode of the meter; thensome countermeasures associated with the electrical meters are Source report is conducted by Northeast Group LLC. a r X i v : . [ c s . C Y ] J a n WW ’20, April 20–24, 2020, Taipei, China Hu et al. proposed, including installing a centralized or fully closed meterbox for residents. However, there are three major drawbacks ofthese hardware-driven methods: 1) they require expert domainknowledge to specify the techniques that the electricity thievesuse; 2) it is difficult to design a general meter structure, as differentregions may often have different pilfering tactics; 3) these methodslose their effectiveness once thieves change their tactics.To address these problems, data-driven methodologies have beenapplied to the task of electricity-theft detection. Before reviewingthe existing works on this topic, it is necessary to explain oneassociated technical term : non-technical loss (NTL) [5]. In practice,losses in a utility distribution grid are classified as technical and non-technical. Technical loss is mainly caused by unwanted effects (e.g.,heating of resistive components, radiation, etc.), and is unavoidable .By contrast, non-technical loss is defined as the energy that isdistributed but not billed; in other words, this type of loss is causedby issues in the meter-to-cash processes. Although disparate issuesmay contribute to non-technical losses, a large proportion of thereasons that cause NTL are related to the electrical power pilferingand frauds [14]. It is therefore straightforward to detect electricitytheft from abnormal NTL records [1, 35]. NTL provides transformer-area information. To capture individual patterns, many existingwork have utilized electricity usage records or NTL as input [12,20, 21, 25, 29, 39], and applied various machine learning techniques(e.g., SVM, CNNs, RNNs) to identify electricity theft. However,most of these methods have failed to obtain good performance,due to the diversity and irregularity of electricity usage behavior,which is almost impossible to fully understand either using NTL orelectricity usage records alone.Motivated by the abovementioned concerns, in this paper, wepropose to recognize electricity-theft behavior by bridging three dis-tinct levels of information: micro, meso, and macro. At the micro-and the meso-level, we seek to capture users’ abnormal behav-ior from electricity usage records and NTL respectively. At themacro-level, moreover, we creatively study how climatic conditionsinfluence electricity-theft behavior. We then effectively integratethese three levels of information into a uniform framework. Ourwork achieves significant progress in the real-world application ofelectricity-theft detection by deploying the proposed model in StateGrid of China and improving the theft detection performanceTo be more specific, we employ a multi-source dataset, whichcontains two years’ worth of daily electrical power consumptionrecords from 311K users in Zhejiang province, China during June2017 to April 2019, along with the corresponding daily NTL recordsand climatic condition data in all transformer areas covered by thoseusers. In addition, we have 4,501 (1.45%) electricity-theft labels rep-resenting 4,626 cases of electricity theft (note that a single user maypilfer electrical power several times within the relevant timespan),all of which were confirmed during several large-scale on-site in-vestigations conducted by State Grid staff over the two years inquestion. Our empirical observations find that: 1) at the macro-level,climatic condition (or more specifically, temperature , which is ourmain focus on in this paper) affects users’ electricity consumptionto some extent, while users belonging to different groups may show See https://en.wikipedia.org/wiki/Losses_in_electrical_systems for details. The state-owned electric utility of China, and the largest utility in the world. different levels of correlations between electricity usage and tem-perature; 2) as previous works have pointed out [1, 35], NTL is animportant factor in detecting electricity thieves at the meso-level;our detailed analysis shows that abnormal NTL patterns in thetransformer areas are a strong signal on electricity theft; 3) at themicro-level, since electrical power consumption fluctuates withtime, we can clearly see the temporal relationships within the in-dividual electricity usage, such that the unusual patterns during aspecific period may indicate abnormal user behavior. These find-ings are hierarchically illustrated in Figure 1, which indicates thatusers consume more electric power as the temperature graduallyincrease (due to e.g. the usage of air conditioners). In order to avoidhigh electricity bills, some users may employ some tactics to pil-fer electricity, which leads to some abnormal fluctuations in theirelectricity-usage sequences as recorded by the smart meters. Atthe same time, the NTL of the corresponding transformer area canreveal anomalies related to abnormal electricity usage.Despite the interesting insights provided by our empirical obser-vation, the question of how to integrate multi-source informationinto a uniform framework remains a challenging one. To capturethe temporal and spatial correlations of multi-source sequences,one straightforward method involves first concatenating them ateach temporal point, then adopting a single latent representationto capture the overall patterns, such as multiscale recurrent neu-ral networks (MPNN) [18]. However, concatenation from differentsources widens the feature dimensions, which precludes capturingthe significant influences of macro- or meso-level information onmicro-level information (Section 3). Therefore, we propose a hier-archical framework, named Hierarchical Electricity-theft BehaviorRecognition (HEBR), to extract features and fuse them step by step(Section 4). Experimental results of the electricity-theft detectiontask on a real-world dataset demonstrate the effectiveness of theproposed HEBR method (Section 5).Most excitingly, HEBR has been employed for real-time electricity-theft detection in the State Grid of China. During the monthlyon-site investigation in August of 2019, we successfully caught elec-tricity thieves in Hangzhou, China, and punished them instantly bychecking the suspected samples predicted by HEBR, representingan improvement in precision from to (Section 5.4). Thesuccess of the proposed model in this real-world application furtherillustrate its validity. Accordingly, the contributions of this papercan be summarized as follows: • We analyze electricity-theft behaviors from three distinct levelsof information based on multi-source data. • Based on observational studies, we propose the HierarchicalElectricity-theft Behavior Recognition (HEBR) model, which iden-tifies electricity theft by fusing different levels of information,and validate its effectiveness using a real-world dataset. • We apply HEBR to catch electricity thieves on-site and achievesignificant performance improvements.

Let U be a set of users, while A is the set of corresponding trans-former areas; that is, each user u ∈ U belongs to a specific trans-former area a ∈ A based on the regional location. An area a usuallycontains hundreds of users. Each user u has electricity usage records nderstanding Electricity-Theft Behavior via Multi-Source Data WWW ’20, April 20–24, 2020, Taipei, China with T observations within a certain timespan, which we refer to as X eu = { x e , ..., x eT } . Each area a ∈ A has the NTL records, denotedas X la = { x l , ..., x lT } , and the observation sequence of climatic con-ditions, written as X ca = { x c , ..., x cT } . The denotation x ∗ t ∈ R d ∗ represents different quotas with various dimensions (see detailsin Section 3.1). In light of the above, we can define the problemaddressed in this paper as follows: Definition 2.1.

Electricity-theft detection.

Given a specific user u who belongs to the transformer area a , the goal is to estimate P (Y|X eu , X la , X ca ) , which is the probability that u pilfers electricalpower ( Y =

1) or not ( Y = Although existing data-driven electricity-pilfering-detection meth-ods seek to capture the characteristics of users’ electrical powerconsumption, it is rather difficult in real-world applications to effi-ciently catch electricity thieves if we only observe the user’s elec-trical power consumption records; this is due to the diversity, com-plexity and irregularity of electricity-theft behavior. Therefore, wecompile a multi-source dataset, with additional NTL and tempera-ture records for each transformer area, in order to analyze the tracesof users’ electricity usage. The associated observational findingsand insights are presented in this section.

Our dataset comprises three parts: the two sets of electricity-relatedrecords are provided by State Grid Zhejiang Power Supply Co. Ltd. ,while the temperature records are collected from the official weatherwebsite. The overall statistics are summarized in Table 1. Electricity Usage Records.

This dataset covers the daily electricalpower consumption records of users in total, ranging fromJune 2017 to April 2019. For each user, we have the total, on-peakand off-peak electricity usage (kW · h) records for each day withinthe relevant timespan. Non-Technical Loss Records.

This dataset contains the dailymeso-level electrical records from a total of transformerareas, covering all of these 311K users, and has the same time rangeas the usage dataset. More specifically, for each area, the dailyamount of electrical power (kW · h) lost due to non-technical loss(NTL) is recorded. Temperature Records.

We obtained the temperature records forall prefecture-level cities in Zhejiang during the same timespan asabove from the Weather Radar . For each city, these records containthe maximum and minimum temperatures (℃) for each day. Labels of Electricity Thieves.

Among all users, were confirmed by State Grid staff to be electricity thieves duringtheir on-site investigations; it should be noted that there are atotal of electricity-pilfering cases, since a single user may be http://en.weather.com.cn (a) A case of electricity usage(b) Electricity usage before and after on-site checking (c) PDF of slope w.r.t linear regression of electricity usage k W · h Figure 2:

A study of micro-level factors and electricity theft. (a)presents a case of electricity usage with the red bar indicating thetime at which the thief was caught. (b) presents the statistics of allusers pertaining to electricity usage before and after on-site check-ing, while the trends of the usage are shown in (c). We can thereforesee that the behaviors of electricity thieves are different from thoseof normal users. caught committing theft several times during the two years. Wethen regard all remaining users (98.55%) as normal, i.e., as havingengaged in no pilfering behavior over the entire timespan. Whileit is possible that a few users who adopt subtle ways to pilferelectricity, which are not caught; these cases bring in the noise butare very rare. We take these confirmed cases as ground-truth andcollect the timestamps when electricity thieves were caught fordetailed analysis and experiments.

We first examine the micro-level electricity usage records to seewhether abnormal patterns or characteristics in the user behaviorexist that indicate electricity theft.To better understand pilfering behaviors, we first have to rec-ognize the time periods during which the thieves were pilferingelectricity. According to the timestamps of on-site investigations,the records for each electricity theft can be divided into two parts:before and after being caught. For convenience we refer to the spe-cific timestamp as a checkpoint . We further assume that a thief wasstealing the power during the last 30 days before the checkpoint,then returning back to a normal state in the following 30 days afterthe checkpoint. This is based on the domain knowledge that we

Table 1: Overall statistics of the datasets.

Metric Statistics

WW ’20, April 20–24, 2020, Taipei, China Hu et al. have chosen a month as the timespan, since an electricity thief of-ten steals power for a long time (probably for a period longer thanone month), and once caught and punished, he/she would instantlycease engaging in pilfering behavior and behave normally. To givea concrete example, Figure 2a presents a case of an electricity thiefwho was caught in the middle of October 2018 (red bar). We cansee an abnormal pattern in the middle of September, showing thathe had a sudden peak of electricity usage, while he had almost nooff-peak electrical power usage during Sep. and Oct. Moreover, inthis case, there is no significant decreasing trend of electricity usagebefore he was caught; in practice, however, users would probablyuse less electricity in Sep. and Oct. due to the temperature drop.These clues may revel the thief’s abnormal state or behavior.In light of the above assumption and case study, we provide anoverview of the electricity consumption of both electricity thievesand normal users in Figure 2b, 2c. We draw the distributions ofdaily power usage at different times (before and after being caught)in Figure 2b; for each group of settings, we show the average valuealong with the standard error bar. We can clearly see that for elec-tricity thieves, there is a significant increment of electricity usageafter being caught compared with before (the two histograms on theleft-hand side). This is reasonable, since pilfering behavior wouldreduce the electricity that they actually consume. An opposite trendis exhibited by the normal users in that the usage before and afterthe checkpoint seems unchanged; in other words, the characteris-tics of the electrical power consumption behaviors of normal usersare relatively stable.Another obvious observation from Figure 2b is that electricitythieves exhibit a much higher level of electricity usage comparedwith normal users, although pilfering behavior would cut down therecorded value of electrical power consumption. One reasonableexplanation for this phenomenon comes from the idea that peopletypically engage in risky behavior only when they expect a higherbenefit. As for this scenario, users whose electricity consumption ishigh are expected to have a far stronger motivation to steal power,as they would reduce their costs significantly through engaging inpilfering behavior. By contrast, if a user uses a very small amountof electricity, there is no need for him/her to undertake such illegaloperations since this behavior would also be economically foolish,as once being caught, the user would be fined vast amounts.To further illustrate the differences in the stability of electricityusage between electricity thieves and normal users, we conducta linear regression on electricity usage during the three monthsbefore the theft was identified. For better visualization and expla-nation, we set a restriction on the checkpoint: in short we onlysample the thieves who were caught during October. Under thesesettings, user are expected to reduce electricity usage stably at theend of Aug. and Sep. compared with Jul. due to the drop in temper-ature, such that the linear regression may be able to capture thisdecreasing trend. We use a probability density function (PDF) [32]to compare the distributions of slopes in the linear regression. AsFigure 2c shows, the gap of PDF curves between electricity thievesand normal users represent the different trends in electricity usage.More specifically, and in line with our commonsense expectations,the usage of the majority of normal users (80.30%) exhibits a nega-tive slope; by contrast, for electricity thieves, the peak of the PDF (a) A case of electricity usage with NTL(b) The statistics of all users before and after on-site checking

Figure 3:

The study of meso-level factors to electricity theft. (a)presents an electricity-usage case with NTL and the red bar indi-cates the time of being caught as thief. (b) presents the statistics ofall users and transformer areas. Both of them illustrate that the ab-normal increment of NTL can be caused by pilfering electricity. lies around 0, while nearly half of them even have a positive slope,which is entirely opposite to the common cases.Combined with the analysis mentioned above, we can concludethat electricity thieves have a higher level of electrical power con-sumption with less stability.

Although we can observe several differences between electricitythieves and normal users in terms of electricity usage, due to thefact that user behaviors are very complicated and may be affectedby many factors, focusing only on those micro-level differencesprevents us from efficiently identifying electricity thieves. Here,another intuitive element of knowledge is that the transformerarea records the overall electricity consumption of different users,i.e., non-technical loss (NTL). Hence, in order to reveal the meso-level characteristics of electricity thieves, we conduct an analysiscombining regional information with individual electricity usage.We begin with a real-world case study of an electricity theft inFigure 3a. The user was caught as a thief in the middle of August(red bar). A clear and interesting observation is that the trend ofcorrelations between the user’s electricity usage and NTL beforehe was caught is completely opposite to that of the period afteron-site checking was conducted. More specifically, before the userwas caught, the less electricity usage his records showed (probablya large part of the power used was stolen), the higher the NTL ofthe transformer area would be; after on-site checking, however, theNTL of the area became relatively stable.We can confirm this finding on the whole dataset by observingthe correlations between the averaged NTL of all transformer areasand the individual electricity usage of all users during the periodbefore and after on-site checking (Figure 3b). In more detail, wemove the sliding window to retrieve the electricity usage of eachthief: 1) 50 days before, and 2) 30 days after the checkpoint. We thensample the usage records of all normal users in the same transformerarea and during the same period as the thief. Again, we can see nderstanding Electricity-Theft Behavior via Multi-Source Data WWW ’20, April 20–24, 2020, Taipei, China J a n F e b M a r A p r M a y J u n J u l A u g S e p O c t N o v D e c ( E l e t r i c it y t h i e v e s )

193 99 48 68 74 88 119 142 112 83 168 184

Figure 4:

Number of electricity thieves in each month. that the electricity usage of normal users (green line) are ratherstable during the observed timespan; for electricity thieves (orangeline), however, their averaged power consumption significantlyincreased once they were caught, while the NTL of their transformerareas dropped accordingly. It gives us the clue that the NTL in thetransformer area may be a signal for indicating whether there areelectricity thieves in this area, and additionally, that capturing suchcorrelations could be helpful in electricity-theft detection.

Previous work [23] has demonstrated the influence of climatic con-ditions on user behavior (taxi ordering). It would be interesting todetermine whether some relationship between climate and elec-tricity usage exists in our scenario, or whether the macro-levelfactors affect users’ electrical power usage in a non-trivial way.Accordingly, we present the statistics of the seasonal effects onelectricity-pilfering behavior (Figure 4), which reveal that mostpilfering cases were caught in winter (Dec., Jan.) and summer (Jul.,Aug.). This leads to the straightforward conclusion that the climaticconditions would influence the user behavior of electricity usage,especially for the electricity thieves. Inspired by the practical ex-periences, temperature is the only climatic variable we consider inthe present research, since the most obvious difference betweenwinter and summer is the temperature factor, and it is believed thattemperature will influence users’ daily behavior in an intuitive way.We use the averaged value of maximum and minimum in a day torepresent the daily temperature; this setting is used throughout theentire paper unless otherwise indicated.We first illustrate that a correlation does indeed exist betweentemperature and users’ electricity usage, as shown in Figure 5a.We indicate the averaged daily total electricity consumption of allusers over one year with an error bar (blue line), while the orangeline indicates the temperature each day. Total electricity usagefluctuates with the temperature, as some sharp peaks and valleysare coincident; the most obvious correlations between these twofactors is that extremely high or low temperatures are associatedwith increased electricity consumption. This would appear to be acommonsense observation since extreme temperatures are of courseassociated with the increased usage of high-powered appliancessuch as air conditioning or heaters; here we verify this assumptionby means of a brief visualization.We next examine the relationships between temperature andelectricity usage among different groups of users. In Figure 5b, weaggregate the daily total electricity usage based on the temperatureof that day, then draw a boxplot with respect to our two user groups (a) The correlations between temperature and total electricity usage(c) A case of electricity thieves (d) A case of normal users (e) Distrubution ofL2-loss function(b) Electricity usage distribution in different temperatures º C Figure 5:

The study of macro-level factors to electricity theft. (a)presents the strong correlations from climatic conditions to user be-haviors. (b)-(e) illustrate the electricity-usage irregularity of thievesunder different temperature conditions. The floating numbers in(b) indicate the Wasserstein distance of distribution between thethieves and normal users. of interest, i.e. electricity thieves and normal users. In line withwhat we have observed in Section 3.2, we again find that electricitythieves have a much higher level of electrical power consumptionregardless of the temperature. Furthermore, additional observationreveals the specific influence of temperature on electricity pilfer-ing, as the gap of daily total electricity usage between thieves andnormal users is significantly larger under extremely high or lowtemperatures compared with average temperatures: if we regard atemperature lower than 0℃ or higher than 30℃ as extreme condi-tions (as is accepted by the public), the average gap of daily totalelectricity usage between thieves and normal users under extremeconditions is 4.51kW · h, while that for non-extreme conditions (be-tween 0 and 30℃) is 3.03kW · h (a decrease of 32.8%); moreover, if werestrict the temperature span to between 10℃ and 25℃, the averagegap becomes 2.70kW · h. We further compute the Wasserstein dis-tance ( d w ) [9] of electricity usage distributions between thieves andnormal users under different temperatures, and find that distancesfor extreme temperatures ( d w =4.5 ( > d w =2.1 (19 − WW ’20, April 20–24, 2020, Taipei, China Hu et al. a normal user in the same transformer area as examples, then showthe pairs of daily total electricity and temperature in the form of 2-Dfigures. For better visualization and illustration, we also draw thesecond-order regression curve in both Figure 5c and 5d. Again, thetrends of the normal user’s curve is consistent with our experiencesuggesting that very high or low temperatures are associated withelevated electricity consumption, and the points are well fitted toa specific quadratic curve (Figure 5d); by contrast, the scatterplotfor the case of the electricity thief seems disordered (Figure 5c). Itis better to explore such correlations by merging users togetherwithin different groups; however, since the scale of each user’selectricity consumption may vary substantially, the scatterplot ofthe data for all normal users may not well fit to a quadratic curve.An alternative approach would to individually fit the points foreach user, then show the gap in the distributions of the fitting lossbetween normal users and electricity thieves (Figure 5e). The twoPDFs simply reveal the fact that scatters of normal users can befitted more easily to a quadratic curve than thieves.Summarizing from the abovementioned observations, we canconclude that the analysis based on the combination of electricityusage, NTL and temperature can yield extra information relatedto electricity-theft behavior, and can also give us strong insightsand motivations to consider the correlations among these threedifferent levels of factors when detecting electricity theft.

In this section, we integrate the insights gained from empiricalobservations (Section 3) into a hierarchical framework, named Hier-archical Electricity-theft Behavior Recognition (HEBR), to capturethe behavioral patterns from multi-source observation sequences.

Overview.

Motivated by Section 3, we model the user behaviorsbased on three distinct levels of information, as follows: • Macro-level . We define X ca , a ∈ A , to represent the observationsequence of temperature, which reflects the climatic conditionsthat may influence users’ electricity consumption patterns. • Meso-level . We define X la , a ∈ A , to represent the observationsequence of non-technical loss (NTL), which indicates the real-time status of the corresponding transformer area. • Micro-level . We define X eu , u ∈ U , to represent the observationsequence of users’ electricity usage, which presents the trace ofindividual electrical power consumption behaviors.The empirical observations demonstrate that the macro- and meso-level information influence the micro-level behaviors to some extent.In order to integrate the multi-source information so as to capturethe abnormal behavioral patterns of electricity thieves, intuitively,we develop a hierarchical framework to extract features from theirrespective different sources, and fuse them step by step. When modeling the multi-source sequences, a straightforward base-line can be established by first concatenating them at each temporalpoint, then using a single latent representation to capture the over-all patterns, such as

MRNN [18]. However, concatenation fromdifferent sources widens the feature dimensions, which may pre-clude capturing the significant correlations between different levelsof information. More specific, in our scenario, different users in the

Climate ºC Transformer Area k W · h User k W · h Level 1

Fusion Layer

Level 2Level 3

Fully Connected LayerSoftmax Output

Input … ………

Recurrent Layer Fusion LayerRecurrent LayerRecurrent Layer

Output

Climate (cid:990)

User

User (cid:988)

Area

Climate&User (cid:990)

Area&User

Figure 6:

Architecture of HEBR. Given a multi-source observationsequence (climate, area and user), HEBR constructs a three-levelframework: at each level, different sequences are inputted into dif-ferent recurrent layers respectively, the latent representations ofwhich are fused in pairs. Dashed lines indicate the direction of thefusion. The last layer outputs the probability of electricity theft. same transformer area may have the same observation sequence ofNTL or temperature; therefore, the straightforward concatenationcould result in a confusion of user behaviors from the regional andclimatic levels. Accordingly, we opt to better extract the indepen-dent features of each source respectively, and then conduct thepairwise information fusion.In Figure 6, we present the architecture of the proposed HEBRframework. In addition to input and output, HEBR contains threelevels of feature extraction and hierarchical fusion, which aim tocapture different levels of influence between data sources. Eachlevel is described in more detail below. • Level 1:

Captures the temporal patterns in the observation se-quence independently (e.g. the patterns in temperature ( h c ), NTL( h l ) and user’s electricity usage ( h e )), and fuses them in pairs( h c → h e , h l → h e ). It aims to model the influence from macro-or meso-level factors on user behaviors respectively. • Level 2:

Captures the temporal patterns after preliminary fusionat Level 1 (e.g. user-climate ( h ec ) and user-area ( h el )) respectively,and then fuses the patterns of h ec → h el . It aims to uniform theinfluence from macro and meso level on user behaviors. • Level 3:

Captures the overall temporal patterns in the multi-source sequence ( h ecl ). The hierarchically fused informationis integrated to capture the behavioral patterns, which can beapplied to estimate the probability of electricity theft.Note that we do not fuse the representations h c and h l in the firstlevel. This is because our observational studies (Section 3) suggestthat these two distinct levels of information are uncorrelated, al-though they are closely related to user’s electricity usage. Here,we aim to capture how these two factors influence user electricityconsumption behavior.Based on the hierarchical construction, HEBR can conduct fea-ture extraction and fusion in each level and gradually integrate theinformation between multiple sources. As for the operations in eachlevel, we define a uniform formulation: given the sequential input I ( k ) t and another source I ′( k ) t in the k -th level, the feature extraction nderstanding Electricity-Theft Behavior via Multi-Source Data WWW ’20, April 20–24, 2020, Taipei, China I’t-1 I’t I’t+1It-1 It It+1 h ’ t-1 h ’ t h ’ t+1 h t-1 h t h t+1 Input of current sourceInput of another source

Recurrent LayerRecurrent LayerMulti-step Fusion & Attention

Output

Figure 7:

The architecture of recurrent and fusion layers on multi-source sequences. The sequential inputs of another source I ′ are in-tegrated into the current ones I by means of the above architecture,which is a uniform component in Figure 6. and hierarchical fusion can be formulated as follows: h ( k ) t = F recurrent (cid:16) W I → h · I ( k ) t , h ( k ) t − (cid:17) , h ′( k ) t = F recurrent (cid:16) W I ′ → h ′ · I ′( k ) t , h ′( k ) t − (cid:17) , I ( k + ) t = α t · F act (cid:16) F fuse (cid:16) h ( k ) t , W h ′ → h · h ′( k ) t (cid:17)(cid:17) (1) h ( k ) t denotes the latent representation of I ( k ) t in the k -th level at tem-poral point t , which is updated by function F recurrent based on itsprevious memory h ( k ) t − and current input I ( k ) t , while the apostrophe( ′ ) denotes the same meaning of the sequences from another source. W ∗ are the trainable weighted matrices, and the function F fuse aims to fuse the information from I ′( k ) t into the current sequence I ( k ) t , then outputs the intermediate representations by means ofan activation function ( F act ). Moreover, α t denotes the attentioncoefficient from h ′ t to h t , which tries to automatically discoverthe attention weights from I ′ to I based on “end-to-end” learning.We will introduce the details of model inference and learning fordetecting electricity thieves in the next section. Herein, we adopt neural networks to implement HEBR, the param-eters of which are learned by minimizing some specific loss. Asfor capturing the temporal patterns, we can apply several existingmethods to implement the recurrent layer F recurrent (see details inTable 6 of Section 5.3). As for the fusion function F fuse , we proposea new hierarchical fusion mechanism, containing multi-step fusionand attention operations, to effectively bridge the information fromdifferent sources. More details are introduced in the next paragraph. Hierarchical fusion mechanism.

In order to capture the cor-relations between different levels of information, we propose amulti-step fusion mechanism. The intuition here is that the influ-ence between two distinct levels of sequences may be time-delayed.For instance, in addition to being affected by today’s temperature,a user’s electricity usage is also probably related to yesterday’sweather; one concrete example is that if the previous day was veryhot, people will tend to turn on the air conditioning for a longertime, even if today is colder. Therefore, we should try to fuse moreinformation in the temporal interval rather than just at the currenttemporal point. More specifically, as shown in Figure 7, the current latent representation h t ∈ R d h and that from another data source h ′ t ∈ R d h ′ are fused via the following formulation: F fuse (cid:0) h t , h ′ t (cid:1) = (cid:0) h t ⊙ W h ′ → h h ′ t − (cid:1) ⊕ (cid:0) h t ⊙ W h ′ → h h ′ t (cid:1) (2)where ⊕ denotes the concatenation operator and ⊙ is the pooling op-erator. For the t -th time step, (cid:16) h t ⊙ W h ′ → h h ′ t − (cid:17) and (cid:0) h t ⊙ W h ′ → h h ′ t (cid:1) capture how h ′ t − and h ′ t respectively influence h t .However, it is impossible that the fused information at each timestep will be equally important to behavioral patterns. For exam-ple, someone is probably an electricity theft if he consumes littleelectrical power in summer or winter, but this may not be true inautumn or spring, since users typically consume less electricityduring these months due to seasonal effects. Liu et al. [27] suggeststhat models should try to measure such significant information atdifferent temporal points. Hence, we design an attention mecha-nism to model the varying significance of the fused information atdifferent time steps: given the current fusion level k , the input of thenext level I ( k + ) ∈ R T × d h ( k ) is computed by a linear combinationof the intermediate representations, weighted by a score vector α ( k + ) ∈ R T : I ( k + ) = T (cid:213) t = α ( k + ) t · tanh (cid:16) F fuse (cid:16) h ( k ) t , h ′( k ) t (cid:17)(cid:17) α ( k + ) = softmax (cid:32) W h → α · T (cid:213) t = F fuse (cid:16) h ( k ) t , h ′( k ) t (cid:17)(cid:33) (3)where (cid:205) denotes the concatenation, and W h → α is a trainableweighted matrix shared by all temporal points. The activation func-tion tanh is used to activate the intermediate representations. Model formulation.

So far, we can materialize Eq 1 for HEBR bythe abovementioned definitions, the complete mathematical formu-lations for which are as follows: I e = X eu , I l = X la , I c = X ca , u ∈ U , a ∈ A (4a)  h et h lt h ct  =  F recurrent (cid:0) I et , h et − (cid:1) F recurrent (cid:16) I lt , h lt − (cid:17) F recurrent (cid:0) I ct , h ct − (cid:1) , (cid:34) I elt I ect (cid:35) = (cid:34) α elt α ect (cid:35) · tanh  F fuse (cid:16) h et , h lt (cid:17) F fuse (cid:0) h et , h ct (cid:1) (4b) (cid:34) h ect h elt (cid:35) =  F recurrent (cid:0) I ect , h ect − (cid:1) F recurrent (cid:16) I elt , h elt − (cid:17) , I elct = α elct · tanh (cid:16) F fuse (cid:16) h ect , h elt (cid:17)(cid:17) (4c) h elct = F recurrent (cid:16) I elct , h elct − (cid:17) (4d) H elcu = T (cid:214) t = h elct (4e) where multi-source sequences ( X eu , X lu , X cu ) initialize the input I ∗ ,after which three levels of feature extraction and hierarchical fusion(Eq 4b, Eq 4c, Eq 4d) are conducted. (cid:206) denotes the pooling opera-tor that aggregates the fused representation h elc in each temporalpoint, and the final output is the behavior embedding H elcu of eachuser u . In our experiment, we use mean pooling as the poolingoperator. As for estimating the probability of each user being anelectricity thief, we define a mapping function Ψ that maps the fea-ture embedding into a binary vector, and turn it into the probability WW ’20, April 20–24, 2020, Taipei, China Hu et al. ranging from 0 to 1 by means of the softmax function, as follows: P (cid:16) Y | X eu , X la , X ca (cid:17) = softmax (cid:16) Ψ (cid:16) H elcu (cid:17)(cid:17) (5)We can implement Ψ by means of fully connected networks or somewell-known classifiers.The whole hierarchical framework can also be implemented byHBRNN [13], which is proposed for recognizing skeleton-basedactions by combining the multi-source time series data. The maindifference compared with our approach is that HBRNN implementsfusion functions by directly concatenating two representationsat the same temporal point; by contrast, we design a multi-stepfusion mechanism for HEBR. We will validate the effectiveness ofsuch architecture by comparing both methods in our experiments(Section 5.2) and additional ablation studies (Section 5.3). Learning.

We use the Adam optimizer [24] for the parameterlearning, where the objective function is defined as the binarycross-entropy: L = − (cid:213) u ∈ U ˆ Y u log P (cid:16) Y| X eu , X la , X ca (cid:17) + ( − ˆ Y u ) log (cid:16) − P (cid:16) Y| X eu , X la , X ca (cid:17)(cid:17) (6)where ˆ Y u ∈ { , } is the ground truth of a user being an electricitythief, as confirmed by on-site investigations, and P (cid:16) Y| X eu , X la , X ca (cid:17) is computed by Eq 5. In this section, we conduct experiments on a real-world dataset(introduced in 3.1) to answer the following three questions: • Q1:

How does HEBR perform on electricity-theft detection tasks,compared with state-of-the-art baselines? • Q2:

How does the multi-source information contribute to thedetection task? • Q3:

Can the proposed hierarchical fusion mechanism effectivelybridge the information from different sources?

Baselines.

We validate the effectiveness of HEBR compared withseveral different types of baselines. The first type is classificationmethods based on handcrafted features, which have been commonlyused in existing work on electricity theft detection [11, 19, 21, 35, 36].We list the handcrafted features we consider in Table 2 and employthe following classifiers: logistic regression (LR) [17], support vectormachine (SVM) [38], random forest (RF) [16] and extreme gradientboosting (XGB) [6].The second type of baseline is time series classification methods,including: • Nearest Neighbor : This method determines whether a user u pilfers electrical power with reference to other users close to u . In particular, we consider the following different metrics tocalculate the distance between two users’ time series in our ex-periment: Euclidean Distance (NN-ED) , Dynamic Time Warping(NN-DTW) [3] and

Complexity Invariant Distance (NN-CID) [2]. • Fast Shapelets (FS) [34]: This approach extracts shapelets, the rep-resentative segments of time series, as features for classification.

Table 2:

List of handcrafted features related to electricity theft.

Feature DescriptionPower usage Mean, variance and slope of total, on-peak and off-peak electricity usage X eu NTL Mean, variance and slope of non-technical loss X la Temperature Mean, variance and slope of maximum and minimumtemperature X ca Usage vs. NTL (cid:113) (X eu − X la ) , Euclidean distance between total us-age and NTL DTW (X eu − X la ) , DTW distance between total usageand NTLUsage vs.Temperature (cid:112) (X eu − X ca ) , Euclidean distance between total us-age and temperatures DTW (X eu − X ca ) , DTW distance between total usageand temperatures • Time Series Forest (TSF) [10]: This is a tree-ensemble method fortime series classification.As for the third type of baseline, we consider the followingcompetitive deep learning methodologies: • MRNN [18]: A multiscale recurrent neural network that takes theconcatenated multi-source sequences ( X eu ⊕ X la ⊕ X ca ) as input. • HBRNN [13]: This is a hierarchical recurrent neural network onmulti-source sequences, which is proposed to recognize skeleton-based actions. For the fusion layer, it concatenates the latentrepresentation at the same time points ( F fuse ( h t , h ′ t ) = h t ⊕ h ′ t ). • WDCNN [39]: A wide&deep convolutional neural network fordetecting electricity theft that focuses on capturing periodic pat-terns of users’ electricity usage. • HEBR : The proposed method. We empirically set the dimensionof h e , h l and h c in the first layer as 32, 8 and 8 respectively,and further set the learning rate as 0.01 with the reduction viaa factor of 10 at every 20 iterations. We implement F recurrent byLSTM, and will study how different implementations influencethe performance later in Section 5.3. Comparison metrics.

We use precision, recall and two F-measures(F1, F0.5) as metrics. The F-measure is a measure of a test’s accuracyand is defined as the weighted harmonic mean of the precision andrecall of the test, with the following mathematical form: F β = ( + β ) precision × recallβ × precision + recall We prefer to use F0.5 as the metric for electricity-theft detection,as precision is more important than recall to real application.

Implementation details.

To meet the demands of the applicationscenario, the input of all methods is the historical observationsequence spanning six months. The output is the probability ofelectricity theft, which can be validated in the next month. Thefirst 80% of samples ordered by time are used for training, and wetest different methods on the remaining samples. We also use 10%of samples from the training set as validation set, for avoiding theoverfitting. For baselines that require a classifier, we use XGB [6]with a batch size of 2000. We adopt a larger weight (e.g., the ratio ofnegative/positive) for positive samples to address class imbalance.All the experiments are ran on a single Nvidia GTX 1080Ti GPU. nderstanding Electricity-Theft Behavior via Multi-Source Data WWW ’20, April 20–24, 2020, Taipei, China

Table 3:

Comparison of classification performance (%). The boldindicates the best performance of all the methods.Methods Metrics

Precision Recall F1 F0.5 VariationHandcraftedfeatures LR 9.52 7.12 8.14 8.92 ± ± ± ± ± ± ± ± ± ± ± ± ± We first compare the experimental results of HEBR with otherbaselines to answer Q1 . As shown in Table 3, all handcraft-featuremethods perform poorly, as these methods can only capture a lim-ited number of patterns. Relatively speaking, the ensemble methods(i.e., XGB and RF ) performs better (an average +7% of F0.5). By au-tomatically capturing temporal features, time series classificationmethods achieve further performance improvements, especiallyfor recall. However, these methods cannot effectively handle multi-source time series and therefore suffer in terms of precision. Asimilar phenomenon can be observed the neural network results.In particular, when simply concatenating all multi-source data andinputting it into a recurrent neural network ( MRNN ), we can seethat it identifies all samples as instances of electricity theft. Thissuggests that improperly handling multi-source data will bring inmore noise which hurts performance.

WDCNN tries to capture theabnormal non-periodic behaviors of users, resulting in a perfor-mance improvement. However, the performance of this method isunstable (around 1 .

31 variation). As expected, moreover, modelswith hierarchical structure like HBRNN and HEBR are better ableto handle multi-source data and consequently outperform othermethods. Moreover, HEBR outperforms HBRNN by +3% in termsof F0.5. Through careful investigation, we find that with the helpof the multi-step fusion and the attention operator, HEBR can notonly better bridge the multi-source information, but have superiorinterpretability, the details of which are presented in later chapters.

Effectiveness of multi-source information (Q2).

We studywhether or not the multi-source information can be effective in elec-tricity theft detection. To do this, we remove the input sequencesof temperature and NTL respectively from HEBR. It is notable thatafter each sequence is removed, the number of HEBR’s levels de-creases; once we remove both sequences, HEBR is transformed intoa single recurrent neural network with the users’ electricity usagerecords as input.From Table 4, we can see that multi-source information is sig-nificant: the performance drops substantially when both NTL and

Table 4:

Effect of multi-source information (%).

Removed component Precision Recall F1 F0.5Temperature 20.65 28.58 23.98 21.86NTL 19.01 22.33 20.54 19.59Both 10.46 16.79 12.89 11.31

HEBR 22.54 34.19 27.17 24.19

Table 5:

Effect of fusion and attention operators (%).

Removed component Precision Recall F1 F0.5Multi-step fusion 19.18 32.05 23.99 20.85Attention 20.91 30.27 24.73 22.29Both 18.66 29.47 22.85 20.14

HEBR 22.54 34.19 27.17 24.19

Table 6:

Effect of temporal modeling (%).

Implementation Precision Recall F1 F0.5Average pooling 15.96 32.11 21.32 17.75Linear RNN 19.53 30.24 23.73 21.02GRU 21.04 temperature are simultaneously removed (-12.88% of F0.5). Notably,temperature is slightly less sensitive than NTL to the performance(+1.97% of F0.5).

Effectiveness of hierarchical fusion mechanism (Q3).

Theproposed hierarchical fusion mechanism includes the multi-stepfusion operator and the attention operator. How do these elementscontribute to bridging the information from different data sources?To answer question, we remove each operator in turn and assessthe subsequent impact on performance. Note that after removingthe fusion operator, we adopt a simple concatenation for the fusionlayer: F fuse = h t ⊕ h ′ t .Results are presented in Table 5. From the table, we can see thatthe performance clearly drops when either the multi-step fusionoperator or the attention operator is removed (-2.62% of F0.5 on av-erage). Moreover, removing them both influences the performancemore significantly, suggesting that these two operators work welltogether to bridge the information from different sources. We willlater qualitatively demonstrate the effectiveness of our proposedhierarchical fusion mechanism through a specific application case. Implementations of recurrent layers.

Finally, we study howdifferent implementations of the recurrent layers in our proposedmodel influence the performance. To do this, we use several com-mon methods, such as average pooling, linear RNN [28], GRU [7]and LSTM[22]. As shown in Table 6, the gated networks (GRU,LSTM) outperform the linear methods (pooling, RNN), which il-lustrates that the appropriate temporal modeling is important forimproving the performance.

In the past, the State Grid staff would employ several data-drivenmodels to detect electricity theft. However, these models are in-efficient in practice. For example, in 2018, none of the electricitythieves in Zhejiang were caught by these models. However, in orderto catch thieves, large-scale on-site investigations are very costlyand time-consuming.

WW ’20, April 20–24, 2020, Taipei, China Hu et al.

TemperatureNTLElectricity Consumption

Jun, 2018 Jul Aug elc

Attention Score

Jun, 2018 Jul Aug F u s i on l a y e r ecel Figure 8:

A case study of the attention operator shown togetherwith different fusion layers ( α ec , el , elc ). The upper rows present dif-ferent multi-source observation sequences, while the red bar indi-cates the time at which the electricity theft was detected. The lowerrow presents the score vector of the attention operator by heat map. Accordingly, in order to improve the accuracy of on-site investi-gation and validate the effectiveness of our model, we employedHEBR for monthly on-site investigation, in cooperation with StateGrid Hangzhou Power Supply Co. Ltd.. More specifically, HEBRdetected 20 high-risk users at the beginning of August 2019 andsuggested that the State Grid staff should investigate and collectevidence. It turned out that our approach successfully caught threeelectricity thieves out of 20 with 15% precision in practice, whichrepresented a significant improvement in on-site investigation pre-cision (improved from 0%). Moreover, another six users among theremaining 17 were identified that staff in Hangzhou strongly sus-pected of being electricity thieves, although no clear evidence wasfound during investigation. It is likely that these six users pilferedelectrical power in June or July, then restored electrical meter to itsprevious condition before on-site checking.We next present a specific case identified by HEBR to demon-strate its effectiveness in real-world applications. In Figure 8, for aspecific user, we present temperature, NTL, the user’s electrical us-age record, and three score vectors ( α ec , α el , α elc ) in our model’sfusion layers from top to bottom. We can see that the high-scoringpositions (the brighter areas of the heat map) are where the elec-tricity usages are low, along with hot weather and high NTL. Thisdiscovery is consistent with previous empirical observation (Sec-tion 3): when a user consumes little electricity in hot weather, whilethe NTL increases abnormally, he or she is likely to be pilferingelectricity. Moreover, when examining the fusion layers from topto bottom, the brighter areas in the heat map (Figure 8) becomeincreasingly clear, suggesting that HEBR captures more accuratepatterns of multi-source information. We can see that users do notpilfer electrical power all the time, and that the attention score re-turns to a low point after on-site checking (red bar). From the casestudy and the performance comparison in Table 3, we can see thatthe attention operator can not only improve the electricity-theftdetection performance, but also provides superior interpretability. Additional related works on electricity theft.

Electricity theft,or pilfering, is known as a common phenomenon in developing countries. Substantial effort has been expended to prevent or detectthe behavior of electricity theft. As mentioned in Section 1, thereare two main avenues of works related to electricity theft detectionor prevention. The first of these is hardware-driven methods, andseveral representative works will be discussed herein. Fennell [15]proposed a special, pilfer-proofing, system incorporating a plug-interminal block set and a meter box cover; Depuru et al. [11] firstconclude that electricity theft makes up a significant proportionof non-technical loss (NTL). Secondly, rather than attempting toimprove the hardware of electricity meters, data-driven methodolo-gies have also been proposed that focus on the analysis of electricalpower consumption records. In addition to the works referencedin Section 1, Zheng et al. [39] propose a framework based on wide& deep CNNs , which aims at accurately identifying the periodicityand non-periodicity of electricity usage by utilizing 2-D electricalpower consumption data. Moreover, Costa et al. [8] apply the useof knowledge-discovery in the database process based on artificialneural networks to conduct electricity-theft detection. However,the problem of how to utilize multi-source data (including electric-ity consumption records and other related information) to conductelectricity-theft detection remains unstudied.

Time series modeling.

A basic but rather important characteris-tic of power consumption data is that these records are time seriesdata. Time series modeling has been widely studied over the pastdecades. One traditional avenue here is to extract efficient featuresfrom the original data and develop a well-trained classifier, such asTSF [10], shapelets [26, 37] etc.; another avenue has focused on deeplearning, such as RNNs and their variants (LSTM [22], GRU [18],etc.). In addition, since the dimensions of time series features maybe large, and it is likely that different levels of correlations existbetween features, hierarchical LSTM-based models are proposed tolearn these hierarchical relationships (e.g., HBRNN [13]).

User behavior modeling.

Another related domain of this workis that of the user behavior analysis. One typical case is that of theweb search: for example, Radinsky et al. [33] develop a temporalmodeling framework to predict user behavior using smoothingand trends. However, although these works may provide variousinsights relevant to human behavior modeling, fully understand-ing complicated user behaviors is quite difficult; thus, it may benecessary to tailor the specific analysis to the scenario in question.

In this paper, we study the problem of electricity-theft detection andanalyze the influence of the macro-level (climate) and meso-level(non-technical loss) factors on users’ electricity usage behavior.We proposed a hierarchical framework, HEBR, that encodes thecorrelations between different levels of information step by step.When evaluated on real-world datasets, the proposed method notonly achieved significantly better results than other baselines, butalso helped to catch electricity thieves in practice. In future work,we plan to explore the following aspects: 1) investigating morefactors from different sources to improve electricity-theft detectionaccuracy; 2) extending HEBR to other similar scenarios.

Acknowledgments.

The work is supported by NSFC (61702447), the Fun-damental Research Funds for the Central Universities, and a research fund-ing from the State Grid of China. nderstanding Electricity-Theft Behavior via Multi-Source Data WWW ’20, April 20–24, 2020, Taipei, China

REFERENCES [1] Tanveer Ahmad. 2017. Non-technical loss analysis and prevention using smartmeters.

Renewable and Sustainable Energy Reviews

72 (2017), 573–589.[2] Gustavo EAPA Batista, Xiaoyue Wang, and Eamonn J Keogh. 2011. A complexity-invariant distance measure for time series.

SDM (2011), 699–710.[3] Donald J Berndt and James Clifford. 1994. Using dynamic time warping to findpatterns in time series.

SIGKDD (1994), 359–370.[4] Hung-Po Chao and Stephen Peck. 1996. A market mechanism for electric powertransmission.

Journal of regulatory economics

10, 1 (1996), 25–59.[5] Abhishek Chauhan and Saurabh Rajvanshi. 2013. Non-technical losses in powersystem: A review. In

International Conference on Power, Energy and Control (ICPEC) .IEEE, 558–561.[6] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree BoostingSystem.

SIGKDD (2016), 785–794.[7] Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015.Gated Feedback Recurrent Neural Networks.

ICML (2015), 2067–2075.[8] Breno C Costa, Bruno LA Alberto, André M Portela, W Maduro, and Esdras O Eler.2013. Fraud detection in electric power distribution networks using an ann-basedknowledge-discovery process.

International Journal of Artificial Intelligence &Applications

4, 6 (2013), 17.[9] Marco Cuturi and Arnaud Doucet. 2014. Fast Computation of WassersteinBarycenters.

ICML (2014), 685–693.[10] Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. 2013. Atime series forest for classification and feature extraction.

Information Sciences

239 (2013), 142–153.[11] Soma Shekara Sreenadh Reddy Depuru, Lingfeng Wang, and Vijay Devabhaktuni.2011. Electricity theft: Overview, issues, prevention and a smart meter basedapproach to control theft.

Energy Policy

39, 2 (2011), 1007–1015.[12] Soma Shekara Sreenadh Reddy Depuru, Lingfeng Wang, and Vijay Devabhaktuni.2011. Support vector machine based data classification for detection of electricitytheft. In

Power Systems Conference and Exposition (PES) . IEEE, 1–8.[13] Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural net-work for skeleton based action recognition. In

Proceedings of the IEEE conferenceon computer vision and pattern recognition arXiv: Machine Learning (2015).[17] Steven L Gortmaker, David W Hosmer, and Stanley Lemeshow. 1994. AppliedLogistic Regression.

Contemporary Sociology

23, 1 (1994), 159.[18] Alex Graves, Abdelrahman Mohamed, and Geoffrey E Hinton. [n.d.]. Speechrecognition with deep recurrent neural networks.

International Conference onAcoustics, Speech, and Signal Processing (ICASSP) ([n. d.]).[19] Li-Cai Guo, Zhi-Wei Peng, and Qiang Fan. 2010. A survey of electric energymetering and countermeasures to electric power stealing [J].

High VoltageApparatus

46, 5 (2010), 86–88.[20] Songlin HAN, Dezu SHANG, J Gao, et al. 2002. Talking about Products of Guardagainst Pilfering Electricity and Pilfering Electricity on the Contrary.

ElectricalMeasurement & Instrumentation

39, 9 (2002), 10–15. [21] Hu Hanmei and Wang Yuanlong. 2004. The analysis and discussion of electricity-stealing and precaution method.

Electrical Measurement & Instrumentation

NeuralComputation

9, 8 (1997), 1735–1780.[23] Camille Kamga, M Anil Yazici, and Abhishek Singhal. 2013. Hailing in the rain:Temporal and weather-related variations in taxi ridership and taxi demand-supplyequilibrium. In

Transportation Research Board 92nd Annual Meeting .[24] Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-mization.

ICLR (2015).[25] Blazakis Konstantinos and Stavrakakis Georgios. 2019. Efficient Power TheftDetection for Residential Consumers Using Mean Shift Data Mining KnowledgeDiscovery Process.

International Journal of Artificial Intelligence and Applications(IJAIA)

10, 1 (2019).[26] Jason Lines, Luke M Davis, Jon Hills, and Anthony Bagnall. 2012. A shapelettransform for time series classification. In

SIGKDD . ACM, 289–297.[27] Zongtao Liu, Yang Yang, Wei Huang, Zhongyi Tang, Ning Li, and Fei Wu. 2019.How Do Your Neighbors Disclose Your Information: Social-Aware Time SeriesImputation. In

The World Web Conference (WWW) .[28] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan CernockÃ¡, and SanjeevKhudanpur. [n.d.]. Recurrent neural network based language model.

Conferenceof the International Speech Communication Association (INTERSPEECH) ([n. d.]).[29] Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed, and MalikMohamad. 2009. Nontechnical loss detection for metered customers in powerutility using support vector machines.

Transactions on Power Delivery

Annals of Mathematical Statistics

33, 3 (1962), 1065–1076.[33] Kira Radinsky, Krysta Svore, Susan Dumais, Jaime Teevan, Alex Bocharov, andEric Horvitz. 2012. Modeling and predicting behavioral dynamics on the web. In

The World Web Conference (WWW) . ACM, 599–608.[34] Thanawin Rakthanmanon and Eamonn Keogh. 2013. Fast shapelets: A scalablealgorithm for discovering time series shapelets.

ICDM (2013), 668–676.[35] Joaquim L Viegas, Paulo R Esteves, R Melício, VMF Mendes, and Susana M Vieira.2017. Solutions for detection of non-technical losses in the electricity grid: areview.

Renewable and Sustainable Energy Reviews

80 (2017), 1256–1268.[36] Wu Xiaomei. 2003. A brief view of preventing the electricity power stealing.

Electrical Measurement & Instrumentation

DMKD

22, 1 (2011),149–182.[38] Ji Zheng and Baoliang Lu. 2011. A support vector machine classifier with auto-matic confidence and its application to gender classification.

Neurocomputing