[PDF] Electricity Theft Detection using Machine Learning

Abstract

Non-technical losses (NTL) in electric power grids arise through electricity theft, broken electric meters or billing errors. They can harm the power supplier as well as the whole economy of a country through losses of up to 40% of the total power distribution. For NTL detection, researchers use artificial intelligence to analyse data. This work is about improving the extraction of more meaningful features from a data set. With these features, the prediction quality will increase.

Full PDF

aa r X i v : . [ c s . CR ] A ug eclaration I herewith certify that all material in this report which is not my own work hasbeen properly acknowledged.Niklas Dahringer bstract

Non-technical losses (NTL) in electric power grids arise through electricity theft,broken electric meters or billing errors. They can harm the power supplier as wellas the whole economy of a country through losses of up to 40% of the total powerdistribution. For NTL detection, researchers use artiﬁcial intelligence to analysedata. This work is about improving the extraction of more meaningful featuresfrom a data set. With these features, the prediction quality will increase. cknowledgements

I would like to thank my home university, the Karlsruhe University of AppliedSciences, and my host university, the University of Luxembourg, to make thiswork possible. Also, it is my pleasure to thank the CHOICE Technologies HoldingSàrl for providing the interesting subject and the data. Furthermore, I very muchappreciate the support of my supervisor Professor Dr. Norbert Link. Additionally,I want to thank Patrick Oliver Glauner for his professional support. ontents

Evaluation 26

III hapter 1Introduction

This work is created in cooperation with the Karlsruhe University of AppliedSciences (HsKa) and with the Interdisciplinary Centre for Security, Reliabilityand Trust, University of Luxembourg.The focus of this work is to improve the quality of the detection of non-technicallosses in electric power grids.

The loss of electricity is still a problem for electricity suppliers. Losses have dif-ferent reasons: There are technical reasons, like the internal resistance of powergrid components, as happens in transformers, generators and transmission lines.However, these are not the only drawbacks for the power suppliers, as there is alsonon-technical loss (NTL). NTLs could be electricity meter manipulation, bypass-ing meters, or bribing meter readers to reﬂect a lower power consumption. Also,possible cases of NTL are broken or faulty meters, unmetered supply, technicaland human errors in meter readings, data processing, and billing.Mainly, NTLs cause a drop in revenue for the power suppliers. Also, uncontrolledpower deprivation leads to a decline of the stability of electric power grids. Thisinstability can cause disadvantages for the whole economy of a country. In somecountries like Brazil, India, Malaysia or Lebanon, NTLs account for 40% of thetotal power distribution. Even in developed countries NTL is a problem.1wo approaches exist to detect NTL: The ﬁrst way is to determine the total powerconsumption for the whole power grid including its components. Problems appearthrough changes or faults in the network. Also, it would be necessary to recordthe status of all elements in the power grid.A much better approach, which this work employs, is to use machine learning todetect irregularities in consumption data and, based on the results, to decide toperform an inspection or not. [Gla+17]

One step among many others in this work is the use of machine learning. Machinelearning is when a computer acts without being directly programmed for this task,according to Andrew Ng [Ng17].In other words, machine learning is used to make decisions. Machine learning isused to ﬁnd a solution with a lack of the concrete implementation, which will becompensated with a larger data set. The operator does not directly implement theﬁnal decision process of the program; it is learned from the data. More speciﬁ-cally, the idea is to extract characteristics from the data and to train the programparameters with these features.With this trained program, it is possible to make predictions about similar data,albeit not comprehensively. Machine learning will ﬁnd some patterns in the train-ing data and, using these models to make approximations for future data possible.An intelligent program should be able to react to changes very easily. It wouldbe a huge disadvantage to reimplement the whole program because of small shiftsin the data. With machine learning and its potential to retrain to obtain betterprecision, it becomes needless to provide a concrete solution process to train theprogram. Because of this behaviour, to learn even from changing data, machinelearning belongs to artiﬁcial intelligence.Besides the machine learning model parameters, the so-called hyperparame-ters are another set of parameters that are given to describe complex properties ofthe model before the actual training begins, for example, the maximum depth ofthe trees of a random forest or the limit of the maximum number of possible leaf2odes. The machine learning model parameters adjust during the training to ﬁtthe patterns found in the training data. However, it is also possible to choose thebest-ﬁtting hyperparameters automatically with a random search function plus across-validation; for a detailed description see Section 6.1.It is important to be aware that these parameters, learned with the help ofpatterns, will and should never entirely ﬁt the training data, because after thetraining of the parameters, test data sets have diﬀerent characteristics; this alsoincludes outliers. If the parameters would precisely ﬁt the training data, it couldcause problems for the later use of the model. The decisionmaking with the test setwould be a failure because the parameters would only ﬁt the training data. Thisissue is known as overﬁtting; to avoid this, it is necessary that the parameters donot ﬁt the training set too closely.Another problem would be underﬁtting, meaning that too many characteristicsfrom the training data are ignored, leading to a too simpliﬁed model.Machine learning can be used for diverse types of data like data base entries,pictures, audio ﬁles, etc. [Alp10, pp. 41-44][GS17].

After the training, the machine learning model can perform decisions like classiﬁ-cation. This is about recognizing patterns in the data and matching the data toa distinct class. Classes are the number of possible results, e.g., the number ofdiﬀerent exisiting categories.Another use case, as opposed to the classiﬁcation used in this work, is the regres-sion, which is used to describe a relation between the classes rather than matchingthem to distinct class. Therefore, ﬂoat numbers are often used to express theregression. Both these functions, regression and classiﬁcation, are part of the su-pervised learning [Alp10, pp. 45-49]. 3 .2.2 Classiﬁers

The random forest algorithm belongs to the category of ensemble learning methodsof machine learning algorithms and consists of other weaker ones. The randomforest classiﬁer is known for its fast execution. An interesting part is the trainingof these algorithms: every tree of the forest is trained separately with a randomsubset of the training data. So, every single tree can predict a part of the datawell, and this helps to avoid overﬁtting. It is common to use randomization toreceive a good distribution of the subsets from the training data. Randomizationis also used for the hyperparameter calculation, see Section 6.1.The combination of the classiﬁcation results from every tree leads to a result thatis less aﬀected by biases. [Bro13; Ho95]

A decision tree is used for classiﬁcation as well as for regression. Regarding clas-siﬁcation, decision trees are famous for their speed and accuracy. Well-knownimplementations of decision trees are ID3 and C4.5. During the training a deci-sion tree grows until time it makes a decision. While doing so, its important toadjusts the training set to avoid overﬁtting [Bro13; Sal94].

A support vector machine (SVM) is a supervised machine learning model that isused for classiﬁcation and as well for regression. The idea is to separate classes withthe help of hyperplanes, which are determined by the support vectors [Bam17].An SVM is able to map data into a higher dimension to handle complex sepa-rations [Vap00, p. 138]. However, the execution with large data sets will slow itdown. Therefore, linear implementations of SVMs are often used [Ben+08]. Fur-thermore, an SVM is more robust against overﬁtting than other classiﬁers likeneural networks [CT03]. 4 .2.2.4 Gradient-boosted tree

The gradient-boosted tree is a supervised machine learning model. Additionally, itis an ensemble algorithm that consists of weaker algorithms like decision trees. Aloss function prevents losses during the adding of trees one by one [CG16; Bro16].

The goal of this project is to get as much quality features as possible out of theconsumption time series. Therefore, attempts are made to gain more meaningfulfeatures from the data, taking into account the noise in the provided real-worlddata. Additionally, this work focuses on the improvement of feature extractionand selection.To verify the result, diﬀerent classiﬁers are trained with the gained features andeventually compared against each other. For all of this, the tsfresh library andadditional Python modules are used. 5 hapter 2Related work

The SEDAN research group from the Interdisciplinary Centre for Security, Re-liability and Trust (SnT) of the University of Luxembourg, and the CHOICETechnologies Holding Sàrl worked together for a longer time on the topic of NTLdetection. They publish diverse papers about this domain.A highly informative paper is the

The Challenge of Non-Technical Loss De-tection Using Artiﬁcial Intelligence: A Survey [Gla+17], which gives an overviewabout diﬀerent topics, namely, diﬀerent types of features for NTL detection, likemonthly consumption, smart meter consumption, master data, which could be de-scribed as metadata because they insist on the name and address of the customer,the feeder voltage, and perhaps climate data. Another feature type would be thecreditworthiness, which ranks the payment morale or the income of a customer.Also, the survey covers diﬀerent detection approaches like expert systems, fuzzysystems, neural networks, support vector machines and others. In addition, thepaper assesses other papers about NTL detection and their quality. The size ofthe data sets and the scoring methods for the NTL detection quality are criticized.A data set of circa 22K customers is used in [C+13] for training a neuralnetwork. It uses the average consumption of the previous 12 months and othercustomer features such as location, type of customer, voltage and whether thereare meter reading notes during that period. On the test set, an accuracy of 0.8717,6 precision of 0.6503 and a recall of 0.2947 are reported.Consumption proﬁles of 5K Brazilian industrial customer proﬁles are analyzedin [R+12]. Each customer proﬁle contains 10 features including the demand billed,maximum demand, installed power, etc. In this setting, a SVM slightly outper-forms k -nearest neighbors (KNN) and a neural network, for which test accuraciesof 0.9628, 0.9620 and 0.9448, respectively, are reported.Glauner et al. [Gla+16] compare diﬀerent classiﬁers on a real-world data setof 100,000 customers. Furthermore, they change the imbalances to check the be-haviour of the classiﬁers. As a feature they used the daily average consumptionfrom the data and therewith reached a prediction quality slightly above randomguess.This work shows that it must be possible to improve the prediction quality throughenhancing the quality of the used features.Detecting NTL with provider-independent data is the domain of a paper byMeira et al. [Mei+17]. Therein they compute diﬀerent features with due consider-ation of the temporal, local and similarity criteria. Subsequently, they train threeclassiﬁers and evaluate the results.Of course, using metadata helps to improve the prediction performance, but thiswork will focus on improving features from the consumption time series.7 hapter 3Technologies used The programming language which is used for this project is Python. Python isan interpreter language which also is a high level language. It has been developedby Guido van Rossum [Kuh13, p. 12] and is maintained by the Python SoftwareFoundation [Fou17a].A very special characteristic compared to other programming language is theseparation of code blocs. Separations are used for inner function or commandswhich contains other operation like the if operator or diverse kinds of loops. Verycommon in other languages is to use curly brackets at begin and at end of a blockor using for this separation task a speciﬁc key word. As a separation indentationare used in Python which are readable for humans and for the interpreter. Suchindentation is composed of four spaces. [Den09]In the year 2008 the version 3.0 was released which brought explicit changes inthe syntax of Python [Pek17]. Because it is not easy to migrate every code to 3.Xthe version 2.7 got an extended maintenance [Pet08]. This leads to the fact thatnow both versions are in use.For Python exists numerous packages which extend the function of the originalprogramming language, for example the in this work used packages NumPy, pandasand Dask. 8 .1.1 NumPy

A well-known package in Python is NumPy which is advertised for scientiﬁc work.NumPy is predesignated for mathematical tasks, it provides multidimensional ar-rays, sorting, basic statistical operations and diverse other function for diﬀerentpurposes [com17]. Furthermore, it oﬀers many functions regarding the multi di-mensional arrays. Often, during the evaluation of the results from this work, therandom number generator was used. [dev17; Dal14]

During the whole preprocessing and in the library tsfresh itself the Python packagepandas is used. Pandas is conceived for the manipulation and analyses of timeseries and data tables. It is able to work with named and unnamed data whichare acquired from SQL data sets, Excel spreadsheets or any other data types. Thepandas package orients oneself by the NumPy package.Pandas has two main structures

Series and

DataFrame , the ﬁrst one has only onedimension the second one has two dimensions. This package can handle specialdata formats, for example sometime in a

DataFrame occurs unreadable values suchvalues will be represented as nan (NotaNumber), also pandas can deal with theNumPy data types inf (inﬁnity) or -inf. Very useful is the groupby function whichgroups entries by identical values. On this basis, is it possible to perform a varietyof diﬀerent operations on the data. This is called in pandas split-apply-combineoperations, too. Also important is the ability to slice

DataFrame s and

Series which takes eﬀect for horizontal and vertical slicing, orientated on label, columnnumber, on the index position or index number. For merging and joining diﬀerent

DataFrame s a lot of parameters are provided to perform diﬀerent types of joins,which also was used to merge

DataFrame s in this work. [Tea17]

Dask is a library for parallel computing it is available as a Python package. Thelibrary consists out of two components, one is for the dynamic task scheduling.The other one part contains all the diﬀerent collections like constructs from pandas9nd NumPy. This is the main advantage the NumPy and pandas like structuresand functions. Which makes it easier to parallelize the source code because it’sfamiliar to the common NumPy and pandas packages. [Das15a]Furthermore Dask provides a large set of parameter for the optimization of theparallelization and adaptation for diﬀerent systems. It is possible to load externaldata direct as a Dask

DataFrame or to convert a pandas

DataFrame to a Dask

DataFrame , therefore are some meta data required which describe the original

DataFrame . The conversion from a pandas to a Dask structure or the converslyprocess is a overhead in this library however it is necessary to do this for some inDask missing

DataFrame -operations. [Das15b]Also Dask oﬀers the chance to parallelize code which aren’t covered by the Daskfunctions. Therefore exists the delay function which wraps the modiﬁed originalfunction. [Das15c] scikit-learn [Ped+11] is python library for machine learning. Which is published asopen source. The library is qualiﬁed for data mining and data analysis, therefore itcontains diﬀerent function for classiﬁcation, regression, clustering, dimensionalityreduction, model selection and preprocessing. To realise these funcionality it usesdiﬀerent python library like NumPy, SciPy and matplotlib. [dev16a]The project launch of scikit-learn was in 2007 during a Google Summer of Code.Subsequently Matthieu Brucher join to David Cournapeau’s project. [dev16b]From this python package are used multiple tools for this work like the diﬀerentclassiﬁers or the model selection functions.

The motivation of this work is to improve the later obtained classiﬁcation resultsfor the NTL classiﬁcation with the aid of a diﬀerent approach in the feature pro-cessing. Consensus of the team member was to process the data set diﬀerently asusual with the library tsfresh.tsfresh is delivered as a Python package and is published under MIT licence[Max16a]. It’s developed by Maximilian Christ and Michael Feindt from Blue Yon-10er GmbH and Andreas W. Kempa-Liehr from the University of Auckland [CKF16,p. 1].The main task of the library tsfresh is the calculation of a huge number of featuresextracted from time series. Moreover, the library is able to evaluate the calculatedfeatures and select the most important of them. It’s possible to start workingwith this library with a small knowledge about the data and the objectives to beachieved. [CKF16, p. 1]Principally there are two sub-packages which are the most important ones,the ﬁrst package is called feature_extraction which purpose it is to calculate thecharacteristics of feeded data. The other package perform a probability calculationfor every feature, after this the Benjamini Hochberg procedure is used for theselection of the most important features. [BB17]It has been developed for large data sets which often occurs in the industry 4.0,in IoT (Internet of Things) or accordingly this work, in machine learning. In suchcases a useful ability is the scalability of the FRESH (FeatuRe Extraction basedon Scalable Hypothesis tests) algorithm for ample data sets. [CKF16, pp. 2-3] tsfresh brings an extensive set of algorithms especially for the feature extraction oftime series, afterwards the most ample features are selected. [CKF16, p. 3] Thesecomputed features represents a reduced dimensionality of time series which allottedfor the training of diverse classiﬁers. A diﬀerent approach of time series classiﬁca-tion were diﬀerent types of shape-based methods for the training. [CKF16, pp. 4-5]For the feature calculating in tsfresh the developers used around 55 algorithmsfrom the collection of Fulcher and Jones [FJ14] and Nun et al. [Nun+15].The selection of diﬀerent algorithms for the feature calculation comprises algo-rithms for:• Summary statistics, such as maximum, variance or kurtosis.• Characteristics from sample distribution, such as absolute energy, whether adistribution is symmetric or the number of data points above the median.11 Observed dynamics, such as fast Fourier transformation coeﬃcients, auto-correlation lags or mean value of the second derivative.The complete list of standard function is available in the tsfresh module descrip-tion [Max16b].

The second main step of the feature processing in tsfresh is the feature ﬁltering.It is important that features have a good ration of robustness, which is helpfulagainst outliers and other eﬀects as well, and meaningfulness.To identify meaningful features the library use several hypothesis tests to calculatea p-value for every feature. The advantage of these hypothesis tests are the robust-ness of the whole procedure. In contrast another method would be the principalcomponent analysis (PCA). However real world data sets are noisy, which can leadto poor performance of PCA [Fu11]. Therefore in this work PCA was not used.The hypothesis tests are applied on every previously extracted feature, the result-ing p-values show how much a speciﬁc feature is relevant for the ﬁnal predictionof the target label y . The usage of multiple hypothesis tests oﬀers a better result.There are diﬀerent hypothesis tests available in the library, Fisher’s exact test,Kolmogorov-Smirnov test for binary and non-binary features and the Kendal ranktest. [CKF16, pp. 5-8]Subsequently the following task is to choose which features to adopt and whichto reject. This happens with aid of the Benjamini Hochberg procedure, this al-gorithm selects based on the p-values and depending on the false discovery ratewhich features are relevant for the prediction.12 hapter 4Data For this work, the data sets are provided by CHOICE Technologies Holding Sàrl,which creates solutions for NTL prediction for their customers. The collaborationbetween the University of Luxembourg and this company has the goal to improvetheir prediction quality for their customers. One of these customers is a Brazilianenergy company. The data come from this power company and are used for thiswork.These data encompass the monthly electricity meter reading from the whole region,about 197 million entries, which contain the user ID, date of recording, an entryfor the recorded power consumption, an entry for the charged power consumptionand many other records. This data set provides all the power consumption entriesfor this work. The consumption is measured in kWh.A second data set about customer inspections contains about 800 thousands entrieswith diﬀerent values like the date of inspection, user ID, the result of the inspectionand many others.

Applying the tsfresh library on the raw data set is impossible, due to a diﬀerentstructure of the data set towards the needs of the library. Also, a preprocessing is13equired to provide coveted data for the later following feature extraction.Therefore, this work is to preprocess the data set, to apply tsfresh on the selecteddata, to transform the results of tsfresh to a convenient a data format, and ﬁnally,to evaluate the received results. For the last topic, diﬀerent settings were used toobtain various results and to evaluate those.

For the preprocessing, processing itself, and the evaluation, diverse Python mod-ules exist. One of the most important modules for preprocessing is modi_data : Atthe end, as a result, the module returns all consumptions per customer in a rowfor a speciﬁc time range. Additionally, the latest consumptions of the time seriesare matched with the latest inspection of the corresponding customer. Only thelast inspection of only the customers that have ever been inspected will be used forthis preprocessing. The entries in the inspection data set for the result of realizedinspection will later represent the target vector y , which indicates whether NTLshave happened or not.The reason for using the latest inspection for every customer is to use the mostup-to-date data and the longest possible time range.At this stage, there is a time series for the latest inspection of every inspectedcustomer.The subsequent step is to ensure that all time series have the submitted length andare sustained. This particularly means that no skip in the time series is permitted;the consecutive order is unalterable. Incongruous time series will be removed; thismeans of course that the appertaining target vector y as well as the importantcustomer IDs and the dates of consumption series will be modiﬁed. The compileddates are important for later sorting purposes.The module modi_data needs the assistance of the module processData , which isable to load and save data.The next step is performed with tsfreshwrapper , which transforms the previousresults to a tsfresh-speciﬁc format and starts the execution.With slice_extracted_features , it is possible to make diﬀerent combinations of the14alculated features for later evaluation purposes.At the end, diﬀerent classiﬁers are trained with the help of main_cv and will pointout the quality of the features.This was the simpliﬁed basic logic function of the diverse modules. In the following,these will be explained in detail. The realisation of the previously made plan for the preprocessing of the datarequires some additional explanation. In the following, important steps in thescript are clariﬁed.The module modi_data is responsible for performing a major part of the dataset preprocessing. The module contains only one function, which accepts as pa-rameters the length of the prospective time series and a boolean, which indicatesto save the results; this is useful for a subsequent processing of the results. An-other parameter is the consumptions_column , with which the diﬀerent recordingtypes, see Section 4.1, can be selected. The last parameter of the signature is drop_all_zero_consumptions ; this boolean is responsible for removing time serieswhose entries are all zeros from the output.At ﬁrst, see Figure 1, the data, like the user ID, the date of the record, thechosen consumption type and the inspection result, are read in with the help ofthe module proxessData . Figure 1: Read dataThe next step is to drop the rows with unusable values and the rows withwrong entries in the inspection result. Afterwards, the read data is converted to15he correct data type. Therefore see Figure 2.Figure 2: Convert and drop dataFollowing this, the latest inspection is determined and merged as an inner-joinwith the consumptions. As a result of this, many time series are of insuﬃcientlength or have gaps in their rows. To check for consecutive time series, the columnwith the dates is copied and shifted by one. This makes it easy to compare thecurrent date with the next one. Depicted by Figure 3.Figure 3: Merge, shift and check sequenceNext, a pandas groupBy object is created, which contains every time series asa group. One interesting step is to check these time series, as this happens paral-lelized with a Python list. The chunk size is calculated from the number of groupsdivided by amount of processor threads minus one. It is so calculated because onelist element must contain the remainder.By extracting the groups from the groupBy object, concatenating them to a newsmaller groupBy object and appending this as a new element to the list, the fol-lowing code is executed: 16 a t a _ p a r t s = [ ]f o r jumz in r a n g e ( 0 , l e n ( gr_ob ) , c h u n k _ s i z e ) :d a t a _ p a r t s . append ( pd . c o n c a t ( [ gr_ob . get_group (group ) f o r i , group in enumerate ( gr_ob . g r o u p s ) i f ( i >= jumz ) and ( i < jumz+c h u n k _ s i z e ) ] ) .groupby ( ’ID_UC’ ) )p r i n t l e n ( d a t a _ p a r t s )Source Code 4.1: Filling the list for parallelized computingUp next is an enumerable list available for the parallelized execution. This way ischosen for the parallelization because it is hard to use other parallelization meth-ods. For example, the Pool function requires a pickable data object. The Python pickle module helps to transform an object to a byte stream [Fou17b]. This bytestream is easy to share between diﬀerent platforms, to save and its properties areimportant for many parallelization methods. However, the method used in thiswork requires the data to be split into individual groups because pandas objectsare not pickable.The actual parallelization is performed with pipes: every pipe applies the inner_-group function on the time series from the list elements. This internal functionchecks the correct length and consecutive numbering of the dates of the time series.The retained time series, and the other values like the userID, are inserted in ashared queue object, which provides the possibility for multiple access [Fou17c].After the processing, the time series and the other entries are pulled out of thequeue and put together to new NumPy arrays . This procedure will not lead toany problems because every group element is an atomic unit. A diﬀerent order ofthe single time series does not make any diﬀerence to the overall outcome.The last step is to return and optionally save the results.A ﬁrst revision of the code with the objective to increase execution performancewas realised with the help of the Dask collection. Thereby, many data types andfunctions were replaced in this code. Also the code itself was modiﬁed to ﬁt to thenew package. These modiﬁcations result in partially ﬁve percentage points higher17xecution speed than before.

One very helpful ﬁle is processData , which is responsible for loading diﬀerent dataﬁles and saving the results. First, it is important to mention that the loadingfunctions of the diﬀerent data ﬁles always expect the ﬁles with same name andlocation. However, a diﬀerent user name of the ﬁle system represents no problem.A function that is used in this work is get_raw_consumptions . This Pythonfunction reads from the ﬁles, with the meter readings, the two columns for the userID and the date of the recording. Also it is possible to pass additional columnlabels with the parameter extra_columns , for example to discriminate betweenthe measured and the charged electrical power consumption. During the feed-inof the data, the columns with the consumption dates are converted into a pandasdate format. The return value of this read-in function is a pandas

DataFrame .In the same way, get_raw_inspections works, whose task is to read the entriesfrom the ﬁle with the inspections of the customers. Besides the user ID and thedate of inspection, it also reads the inspection result. The return value is a pandas

DataFrame , too.This Python module also owns functions to save the computed results. Thefunction save_assessed_dataframe has the job of saving resulting

DataFrame sas a simple csv ﬁle, where every entry is separated by a semicolon. Besides the

DataFrame itself, it accepts an optional string that is added to the ﬁle nameto describe the ﬁle. In addition, a unique time stamp is concatenated to laterdiscriminate the diﬀerent ﬁles.During the data preprocessing, NumPy arrays emerge that contain diﬀerentresults like the consumptions, the target vector, the IDs and the dates of the con-sumptions. These have to be saved, and therefore save_result_np exists. Thisfunction takes, of course, the array and a parameter called fmt , which representsthe data format, such as integer or ﬂoats with a diﬀerent length. The last param-eter is responsible for the name of the ﬁle; also, a time stamp is added. The savedﬁle corresponds the csv ﬁle type. 18 .2.1.3 Tsfreshwrapper

After the preprocessing with modi_data , tsfreshwrapper follows, which is a wrap-per function for the actual tsfresh library. This function transforms the standardformat of the data, one time series per line, to a tsfresh-acceptable format. Af-terwards, applying tsfresh on the data, tsfreshwrapper is able to save the resultsbefore and after the feature ﬁltering, which is useful for later purposes. In this work, diverse types of features calculation algorithms are used. Some tsfreshitself provides, the in this work called standard features or generic time series fea-tures (GTS). The daily average (AVG) and the features that are especially createdfor the NTL detection, and the diﬀerence features (DIF), for further informationsee Section 5.1. To compare the gained features from each of these algorithms, it isnecessary to split them. Since all of these diﬀerent feature types are computed atonce with tsfresh, and the result is a

DataFrame with all the features, is it impor-tant to have the ability to separate the diﬀerent feature types for the evaluationsfollowing afterwards.For this work, the module slice_extracted_features is created. The signature ofthe primary function slice_features accepts, besides the ﬁle name, the lengthof the time series and the number of processes, a variable amount of enums. Forevery feature type an enum exists, such as

DIFFERENCE_FEATURES , DAILY_AVERAGE and

PURE_TSFRESH . With these enums it is possible to name every arbitrary com-bination of features and compute these. Furthermore, it is feasible to perform afeature selection after calculating the combinations. For every combination a new

DataFrame will be saved.At the current state, the module is only able to handle these three featuretypes. An improvement could be the read in of the diﬀerent feature calculationalgorithms from the feature_calculators ﬁle, which contains all the algorithms intsfresh. In addition, it would be useful to use diﬀerent Python decorators for thediﬀerent feature types. That would have the advantage of being able to use anendless number of diﬀerent feature types.19 .2.1.5 Main_cv

To evaluate the diﬀerent types of features with diverse classiﬁers, it is importantto train them ﬁrst and subsequently to test them. However, training the machinelearning model itself is not enough; it is also important to determine the hyper-parameter of every model. Therefore, the function

RandomizedSearchCV from scikit-learn is used. For a more detailed description of the model training, seeSection 6.1.The function determine_classifier_parameter receives a ﬁle with the previ-ously calculated features and a ﬁle with the target label. After converting thedata into a NumPy matrix , the preprocessing.scale() command standardizesthe shape of the distribution [dev16c].With these data, the function run_randomsearch performs a training for everyclassiﬁer. This function determines the best hyperparameters for the classiﬁersand the mean ROC-AUC (see Section 6.2) and its inherent standard deviation.This outcome results after 1k iterations, which are composed of the number of it-erations for

RandomizedSearchCV and the size of the k-fold cross-validation, whichtriggers the corresponding number of iterations for the fold size. The whole trainingand parameter ranking lasts over four days. The training of the gradientboostedclassiﬁer takes the longest time. For a future productive application of this mod-ule, it is only required to train the best performing classiﬁer; this will reduce thewhole training time. 20 hapter 5Detection of non-technical losses

The tsfresh library is designed for the extraction and ﬁltering of features fromdiﬀerent kinds of data or even time series. Therefore, developers promote it as auseful tool for every case where a lack of domain knowledge exists.The cooperation between the SEDAN research group and the partner company al-ready produced a certain knowledge in the ﬁeld of NTL detection and hence theyhave published diﬀerent papers. A helpful paper is about information extractionfrom provider-independent data [Mei+17]. The feature calculation functions pre-sented in this paper are only developed for the time series classiﬁcation problem inthe domain of NTL detection. This ﬁts together excellently with this work for thereason that the tsfresh library provides the possibility to implement user-speciﬁcfeature calculation algorithms.This oﬀers the opportunity to implement the extra features from Meira et. al. inthe designated ﬁle feature_calculators of the library.For the following algorithms it is assumed that the consumption time series isconsecutive and has a length of N months and is described as: C ( m ) = [ C ( m )0 , ..., C ( m ) N − ] , (5.1)where C ( m ) N − is the latest meter reading before the inspection.21he ﬁrst of three diﬀerence functions is called ﬁxed interval. Together with K ∈ { , , } , it computes the diﬀerence between the current consumption andthe mean consumption in a period directly before a meter reading. After applyingthe function, it generates × ( N − features for the data set. ﬁxed _ interval ( m ) d = C ( m ) d − K × d − (cid:88) k = d − K C ( m ) k , (5.2)Another algorithm for the calculation of features is the intra-year diﬀerence intra _ year ( m ) d = C ( m ) d − C ( m ) d − K , (5.3)with K = 12 , which is the diﬀerence of consumption to the consumption in thesame month of the previous year. Overall, the function returns N − diﬀerentfeatures.The intra-year seasonal diﬀerence intra _ year _ seasonal ( m ) d = C ( m ) d − × d − K +1 (cid:88) k = d − K − C ( m ) k , (5.4)for K = 12 , is the change of consumption of the mean of three months in theprevious year to the current consumption. Altogether, the intra-year seasonaldiﬀerence delivers N − features.The previously used daily average feature, which also serve as baseline feature forpurposes of comparison, is for the month d for a customer m in kWh: daily _ avg ( m ) d = C ( m ) d R ( m ) d − R ( m ) d − . (5.5) C ( m ) d is the consumption between the meter readings R ( m ) d and R ( m ) d − , where d is thecurrent month and d − the month before. The diﬀerence between R ( m ) d and R ( m ) d − results in the days between the two meter readings. Concerning this calculation,the function returns only N − features. This feature is common in the ﬁeld ofNTL detection [Nag+08; Nag+10; Nag+11].22 .2 Modiﬁcation of tsfresh The already previously used daily average function as well as the three new func-tions were implemented in the feature_calculators . tsfresh provides three diﬀerentuse cases for the implementation; these are represented by three diﬀerent decora-tors. A Python decorator is related to annotations in Java, which also start withthe @ symbol. In general, their task is to modify elements of the language, forexample, functions or whole classes [Eck08]. In the case of tsfresh, the decoratorsdistinguish between the three deployment types.To calculate a single feature without any parameter, the aggregate feature with-out parameters is suggested. The second function type, aggregate features with pa-rameter , uses the provided parameter from the settings ﬁle. The last one and thetype used for this work is called apply and it is able to calculate multiple featuresat the same time for diﬀerent parameters.It is also possible to utilize parameters for the calculations. The parametersare provided in the settings ﬁle. Inside the constructor of the function

Feature-ExtractionSettings there is a dictionary, name_to_param.update , which con-tains the parameters for the functions that actually require parameters [Max17].The implementations of the custom feature calculators are provided in feature_-calculators .However, to code the daily average function, a bit more eﬀort was needed.As already mentioned in the explanation of the daily average function, the con-sumption C ( m ) d is divided by the number of days between both meter readings; tocompute these days, the speciﬁc dates for every consumption are required. Albeittsfresh uses the optional parameter column_sort only to sort the data, it dropsthe column with the dates.This leads to the problem that tsfresh does not conduct the column with the con-sumption dates to the daily average function. As a result, no dates are availableto calculate the divisor. A chance would be to use the business month with 30days as a divisor, but in order to avoid inaccuracies caused by the diﬀerent lengthsof the months or due to leap years, a more precise way, nevertheless, is to use theoriginal dates. To reach this objective, it is necessary to modify tsfresh itself.23he important function ﬂow regarding the columns with the dates is the fol-lowing: ﬁrst, passing the parameters to the main extraction function extract_-features in extraction.py , a normalization function normalize_input_to_inter-nal_representation follows, which is able to change the input data into a uniformstructure and remove all unnecessary columns except the columns for value andID. The column with the dates is only used to sort the data and then is dropped,see Section 5.1.kind_to_df_map [ k i n d ] = kind_to_df_map [ k i n d ] . s o r t _ v a l u e s (column_sort ) . drop ( column_sort , a x i s =1)Source Code 5.1: Former implementationSubsequently, the internal function _extract_features_for_one_time_series ,which is responsible to extract the features from the data frame, calls the function get_apply_functions . This latter function creates a list with all feature calcula-tion algorithms, the column preﬁx and the parameter belonging to the decorator apply . This list is returned to its previous function, and every extraction functionis applied to the data. After some further steps, tsfresh presents the results of thefeature extraction.To ensure that the dates are passed through to the daily average function, somemodiﬁcations are needed in tsfresh.In order to avoid dropping the dates through the normalization function, the suf-ﬁx .drop(column_sort, axis=1) has to be removed. Then the get_apply_-functions must be modiﬁed, which is called by the internal feature extractionfunction _extract_features_for_one_time_series . This function gathers thediﬀerent feature calculation algorithms from the feature_calculator ﬁle, the col-umn preﬁx and the parameter from settings , and gets as another value the nameof the feature calculation algorithm. This returns the lists to their top function,the internal feature calculator, the _extract_features_for_one_time_series .Next, a selection is made there, between the normal functions and the daily aver-age function. The daily average function additionally gets the previously retained24 olumn_sort data and is applied to these data. Thus, the daily average functioncan use the dates to calculate the days for the divisor.All these modiﬁcations that were made on tsfresh are available as a fork fromthe original tsfresh project under: https://github.com/hargorde/tsfresh . Forlater modiﬁcations, a more general approach would be to use a new decorator forall functions that require the dates of the consumptions.25 hapter 6Evaluation For the evaluation of the diﬀerent features, the responsible module main_cv , seeSection 4.2.1.5, uses

RandomizedSearchCV with a cross-validation (cv) of the typek-fold. Both are described in detail in the following.

Basically, to train a machine learning model, the provided data set is split into atraining data set and a test data set. The trainings set is used for training and thetest set veriﬁes the model afterwards. In order to avoid overﬁtting to a speciﬁcset, it is important to use diﬀerent sets.For the determination of the hyperparameters, an extra set is required: the val-idation set. Within the library scikit-learn , the hyperparamters must be setbefore the training with help of the constructor. This means the training is stillperformed with the training set; then follows the validation set to check that thehyperparamters ﬁt and when the hyperparamters ﬁt; then follows the veriﬁcationwith the test data set.However, splitting the data into three diﬀerent sets reduces the amount of avail-able data. This can partially inﬂuence the machine learning model; to avoid thiseﬀect, cross-validation (cv) is predestined for such issues.26 common type of cross-validation is the k-fold, which uses a collective dataset for training and validation; only for the test an extra set is required. In scikit-learn , the k-fold method divides the training set into k smaller sets, andthe size of k is set by the parameter n_splits . From these subsets k − are usedfor the training of the machine learning model. The remaining fold is used for thevalidation. This happens k times: for each combination of the subsets, a trainingand a validation is performed.Although the k-fold cross-validation needs some computational power, it is usefulfor small sets. Besides the k-fold cross-validators, others exist, such as Leave OneOut, Leave P Out. A cross-validation function provided by the library is cross_-val_score . This function takes the model, the data set, target set, the numberof folds, and optionally a scoring function to evaluate the training results. In thiswork, the scoring method used is AUC. [dev16d] The randomized parameter optimization optimizes of the hyperparamters. Thereby,a dictionary with a set of diﬀerent hyperparameters is deﬁned for this procedureand passed to the function

RandomizedSearchCV . The function creates a set withrandomly picked hyperparameters. The number of diﬀerent combination is de-termined with the parameter n_iter . If the n_iter parameter is increased, thequality of the hyperparmeter search also increases, but the calculation period risesas well. Optionally it is possible to enter a cross-validation method like k-folds.At the end, the best-ﬁtting hyperparameters are retained. [dev16e]

The data set used for this work is, as already mentioned, a real world data set;this means, the data contains noise and outliers. Also, a special challenge is thehandling of the irregularities itself, because they are imbalanced [Gla+17, p. 8].This becomes clear after a ﬁrst processing of the data for the measured consump-tion: There are 150,700 entries that refer to non-NTL, and 50,229 entries that areNTL. The outcome conﬁrms that both classes NTL and non-NTL are imbalanced,27hich is clearly shown by a NTL rate of 33.3% regarding the whole data.Normally, to rate the results of a prediction, a performance measure such asaccuracy or precession is often used. At ﬁrst, measurement methods like accuracyand precision might explain things, but they have problems with imbalanced datasets.The accuracy is the true positives (TP) plus the true negatives (TN) over thenumber of the whole set:

ACC = T P + T NT P + T N + F P + F N . The precision is like this: P = T PT P + F P . Another function is the recall, which is R = T PT P + F N [Gla+17;Gla+16; dev16f].The following example from a survey about NTL detection [Gla+17, p. 8] showsvery clearly the disadvantage from these scores. There is an example data set with100 customers of which 99 are non-NTL. If a classiﬁer predicts non-NTL for allthe entries, the resulting accuracy would be 99%.In contrast, would the classiﬁer always predict NTL, the recall would be 100%.In the ﬁrst case, the classiﬁer will never detect NTL despite a high accuracy. Thesecond case will ﬁnd the NTL, but forces too many inspections to be carried outby inspectors. This behaviour leads to a rise in costs.These results show that it is important to choose another evaluation functionfor the trained classiﬁers, a function that considers the imbalance of both classes[Gla+17].A better evaluation for the output of the diﬀerent classiﬁers is to plot therecall and the opposite of the recall against each other. This results in a receiveroperating characteristic (ROC) curve. The area under the curve (AUC), i.e., theROC curve, is scored from 0 to 1, where 1 means every prediction is correctand 0 points out that no prediction results are correct. Every score over 0.5 isbetter than random guessing. In this work, the function used for the AUC is:

AU C = Recall + Specificity [Gla+16]. 28 .3 Results At the end, the program was applied to the data set for the measured and billedconsumption, at ﬁrst transforming and collating the data, then matching the en-tries and checking the consumption time series.Subsequently, the tsfresh library, with its extra features, was applied on the primeddata and extracted many diﬀerent features.After the calculation of the features, the diﬀerent feature types were separatedwith the slice_features function and diverse combination of these were created.A ﬁrst machine learning model training with scoring started with these unﬁlteredfeatures.The features were then ﬁltered with the help of tsfresh regarding their importancefor the target label y . After this, a model training with an evaluation of theirprediction results starts.Table 1 shows the number of gained diﬀerent feature types like the daily aver-age (AVG), generic time series (GTS), the ﬁxed interval, intra-year diﬀerence andintra-year seasonal diﬀerence, which are also called diﬀerence features (DIF). TheTable 1: Number of features before and after selection.Name stands for the number of features before the feature selection.The other column, , with its sub-labels for the measured andactually billed consumption, contains the number of diﬀerent features after a fea-ture ﬁltering. 29fter the features extraction, tsfresh returns 304 features for both consumptionverdicts, because the library applies all feature calculation algorithms with all pa-rameters on the data. The following ﬁltering retains 237 features for the measuredconsumption and 283 features for the billed consumption. It turns out that thefeatures that were only made for the NTL detection are the best obtained ones.Only two ﬁxed interval features, with a window of K = 3 , were dropped from themeasured consumption; probably it coheres due the short time range. For NTLdetection, the common daily average features are completely retained for the billeddata, but from the features for the measured data, only 18 out of 23 are kept. Aninteresting fact is that the ﬁve dropped features represent the six oldest monthsof a time series with a total length of 24 months. It seems that for the featureselection only the latest 18 months are important.Both consumption types show in comparison that for the billed data many morefeatures are retained; a reason could be the major scope of entries. During thepreprocessing of the measured data, around 135 thousand consumption time se-ries are dropped, because they were a complete zero sequence. The diﬀerent sizeand the diﬀerent data also have an impact on the generic time series algorithms,where 73% and 91% of the features are retained for the measured and billed data.Another reason is that the generic time series features are not made for NTL de-tection and thereby produce features of lower quality.The next Table 2 is created for the measured consumption. It compares thediﬀerent classiﬁers and the feature type combinations. To evaluate the work, dif-ferent classiﬁers are trained with diverse features like the AVG, GTS and DIFfeatures and combinations of them. The best prediction results are achieved bythe random forest for all the diﬀerent feature combinations. The best features setconsists of a selection of all three feature types and gain an AUC of 0.65977. Acomparison of the AUC between the unﬁltered and the ﬁltered features results in27 signiﬁcant diﬀerences. In 15 of 27 cases the ﬁltered features perform superiorto the unﬁltered features.For the billed consumption, see Table 3, the random forest performs mostprecisely of all machine learning models as well. The best training result is reached30able 2: Test Performance of Classiﬁers on Features from Measured ConsumptionData.Clf. GTS AVG DIF GTS+AVG X all X ret X all X ret X all X ret X all X ret DT 0.64544 0.64625 0.64037 0.63985 0.63730 0.63792 0.64712 0.64705RF . c . c . c . c . c . c . c . c GBT 0.63149 0.63125 0.63234 0.63186 0.62869 0.63019 0.63262 0.63322LSVM 0.63696 0.63656 0.54982 0.54933 0.55749 0.55843 0.63725 0.63689Clf. GTS+DIF AVG+DIF GTS+AVG+DIF X all X ret X all X ret X all X ret DT 0.64638 0.64647 0.64348 0.64312 0.64646 . f RF . c . c . c . c . c cf GBT 0.63319 . f . f Test AUC for combinations of decision tree (DT), random forest (RF), gradientboosted tree (GBT) and linear support vector machine (LSVM) classiﬁerstrained on sets composed of general time series (GTS), daily average (AVG)and diﬀerence (DIF) features.The best overall combination of classiﬁer and feature set is highlighted .Per combination of classiﬁer and feature set, the better result on either a fullfeature set ( X all ) or retained feature set ( X ret ) is highlighted . c denotes the best classiﬁer per feature set. f denotes the best feature set per classiﬁer.after a ﬁltering of the daily average, generic time series and diﬀerence features withan AUC of 0.67356.In the most cases, the ﬁltered features leads to better prediction results. Arelevant diﬀerence between the all features and the retained features appears in26 cases. From these 26 cases, 19 times the ﬁltered features delivered a better AUC.Overall, it is worthwhile to mention that the AUC rises with more feature sets;this applies to both test series. The prediction quality for single feature sets is the31able 3: Test Performance of Classiﬁers on Features from Billed ConsumptionData.Clf. GTS AVG DIF GTS+AVG X all X ret X all X ret X all X ret X all X ret DT 0.65901 0.65936 0.65626 0.65654 0.64220 0.64169 0.66040 0.66088RF . c . c . c . c . c . c . c . c GBT 0.65487 0.65479 0.65526 0.65594 0.64044 0.64160 0.66016 0.66021LSVM 0.64481 0.64484 0.60558 0.60512 0.60512 0.60512 0.64530 0.64533Clf. GTS+DIF AVG+DIF GTS+AVG+DIF X all X ret X all X ret X all X ret DT 0.66213 0.66230 0.65982 0.65953 0.66436 . f RF . c . c . c . c . c cf GBT 0.66110 0.66110 0.66384 0.66359 0.66503 . f LSVM 0.64511 0.64520 0.61979 0.62032 . f high-lighted .Per combination of classiﬁer and feature set, the better result on eithera full feature set ( X all ) or retained feature set ( X ret ) is highlighted . Ifa classiﬁer performs the same on both feature set, none is highlighted. c denotes the best classiﬁer per feature set. f denotes the best feature set per classiﬁer.lowest; combinations with two sets lead to a slight increase and the best qualityprediction has a combination of all three.32 .4 Discussion This work shows that the newly extracted features from the data lead to a betterprediction quality than the former daily average features. The AUC with thecombination of all the features gained is just slightly better than the AUC of thedaily average features. A quite important cause for this behaviour is that theprovided data set is a real world data set, which contains noise, outliers, or evenerrors in the data through input error. However, even a small improvement in theNTL detection would cause to a reduction of the cost for carrying out inspectionsfor the customers. Furthermore, the energy company would be able to increasethe stability of the electrical power grid by decreasing the NTL cases. Also, thepartner company of this project would have the beneﬁt of raising their equitythrough the improvement of their products.Surprisingly, the gradient boosted tree always delivered poor classiﬁcation results.Contrarily, in other papers the classiﬁer showed good prediction quality [Roe+05;CG16]. This could be explained with the “no free lunch” (NFL) theorem, whichpoints out that the best machine learning model does not exist [Wol96].33 hapter 7Conclusion

Regarding the previously deﬁned objectives of this work, the goal is achieved.With the preprocessing, the tsfresh library, and the machine learning model train-ing, it was possible to obtain more from the consumption time series. This projecthelped to achieve more meaningful features, also with the help of the extra addedfeatures (DIF). The model training points out the best combination of the featuretypes and the best classiﬁer.A comparison between the formerly used daily average features and the now ob-tained features shows a slight improvement in the quality of the features. However,the reachable quality is limited because real world data were used in this piece ofwork. Besides the improvement of the NTL prediction, this work delivered a frame-work for feature extraction that is able to incorporate further algorithms especiallyfor NTL detection. 34 ist of Tables ist of Figures ist of Source Codes ibliography [Gla+17] Patrick Glauner et al. “The Challenge of Non-Technical Loss DetectionUsing Artiﬁcial Intelligence: A Survey”. In:

International Journal ofComputational Intelligence Systems

Machine Learning . Coursera Inc. 2017. url : (visited on 06/14/2017).[Alp10] Ethem Alpaydin. Introduction to Machine Learning . Second Edition.2010. (Visited on 06/18/2017).[GS17] Daniel Geng and Shannon Shih.

Machine Learning Crash Course: Part4 - The Bias-Variance Dilemma Underﬁtting . Machine Learning atBerkeley. July 13, 2017. url : https : / / ml . berkeley . edu / blog /2017/07/13/tutorial-4/ (visited on 08/06/2017).[Bro13] Jason Brownlee. A Tour of Machine Learning Algorithms . MachineLearning Mastery. Nov. 25, 2013. url : http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/ (visited on07/12/2017).[Ho95] Tin Kam Ho. “Random decision forests”. In: Document Analysis andRecognition, 1995., Proceedings of the Third International Conferenceon . Vol. 1. IEEE. 1995, pp. 278–282.[Sal94] Steven L. Salzberg. “C4.5: Programs for Machine Learning by J. RossQuinlan. Morgan Kaufmann Publishers, Inc., 1993”. In:

Machine Learn-ing issn : 1573-0565. doi : . url : https://doi.org/10.1007/BF00993309 .38Bam17] AYLIEN Noel Bambrick. Support Vector Machines: A Simple Expla-nation . KDnuggets. 2017. url : (visitedon 08/06/2017).[Vap00] Vladimir Vapnik. The nature of statistical learning theory . Springer,2000.[Ben+08] Asa Ben-Hur et al. “Support Vector Machines and Kernels for Compu-tational Biology”. In:

PLOS Computational Biology doi : . url : https://doi.org/10.1371/journal.pcbi.1000173 .[CT03] Li-Juan Cao and Francis Eng Hock Tay. “Support vector machinewith adaptive parameters in ﬁnancial time series forecasting”. In: IEEETransactions on neural networks

Proceedings of the 22nd acm sigkdd international confer-ence on knowledge discovery and data mining . ACM. 2016, pp. 785–794.[Bro16] Jason Brownlee.

A Gentle Introduction to the Gradient Boosting Algo-rithm for Machine Learning . Machine Learning Mastery. Nov. 9, 2016. url : http://machinelearningmastery.com/gentle- introduction- gradient- boosting- algorithm- machine- learning/ (visited on08/08/2017).[C+13] Breno C Costa, Bruno LA Alberto, André M Portela, et al. “Frauddetection in electric power distribution networks using an ANN-basedknowledge-discovery process”. In: International Journal of ArtiﬁcialIntelligence & Applications

IndustryApplications (INDUSCON), 2012 10th IEEE/IAS International Con-ference on . IEEE. 2012, pp. 1–6.39Gla+16] Patrick Glauner et al. “Large-scale detection of non-technical losses inimbalanced data sets”. In:

Innovative Smart Grid Technologies Confer-ence (ISGT), 2016 IEEE Power & Energy Society . IEEE. 2016, pp. 1–5.[Mei+17] Jorge Augusto Meira et al. “Distilling provider-independent data forgeneral detection of non-technical losses”. In:

Power and Energy Con-ference at Illinois (PECI), 2017 IEEE . IEEE. 2017, pp. 1–5.[Kuh13] Dave Kuhlman.

A Python Book: Beginning Python, Advanced Python,and Python Exercises . Dave Kuhlman. Dec. 15, 2013. url : (visited on 07/12/2017).[Fou17a] Python Software Foundation. Python Software Foundation . PythonSoftware Foundation. 2017. url : (visited on 07/12/2017).[Den09] Jim Dennis. Why separate sections by indentation instead of by brack-ets or ’end’ . The Python Wiki. Apr. 5, 2009. url : https://wiki.python.org/moin/Why%20separate%20sections%20by%20indentation%20instead%20of%20by%20brackets%20or%20%27end%27 (visitedon 07/12/2017).[Pek17] Berker Peksag. Should I use Python 2 or Python 3 for my developmentactivity?

The Python Wiki. Jan. 15, 2017. url : https://wiki.python.org/moin/Python2orPython3 (visited on 07/12/2017).[Pet08] Benjamin Peterson. PEP 373 – Python 2.7 Release Schedule . PythonSoftware Foundation. Nov. 3, 2008. url : http : / / legacy . python .org/dev/peps/pep-0373/ (visited on 07/12/2017).[com17] The Scipy community. What is NumPy?

The Scipy community. Mar. 15,2017. url : http : / / numpy . readthedocs . io / en / latest / user /whatisnumpy.html (visited on 07/12/2017).[dev17] NumPy developers. NumPy . NumPy Developers. 2017. url : (visited on 07/12/2017).[Dal14] DaleAthanasias. NumPy . The Python Wiki. May 14, 2014. url : https://wiki.python.org/moin/NumPy (visited on 07/12/2017).40Tea17] Wes McKinney & PyData Development Team. pandas: powerful Pythondata analysis toolkit . Wes McKinney & PyData Development Team.July 7, 2017. url : http : / / pandas . pydata . org / pandas - docs /stable/ (visited on 07/12/2017).[Das15a] Dask Development Team. Dask . Continuum Analytics. 2015. url : https://dask.pydata.org/en/latest/ (visited on 07/12/2017).[Das15b] Dask Development Team. Create and Store Dask DataFrames . Contin-uum Analytics. 2015. url : https://dask.pydata.org/en/latest/dataframe-create.html (visited on 07/12/2017).[Das15c] Dask Development Team. OverviewMotivation and Example . Continuum Analytics. 2015. url : https ://dask.pydata.org/en/latest/delayed- overview.html (visitedon 07/12/2017).[Ped+11] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning Research

12 (2011), pp. 2825–2830.[dev16a] scikit-learn developers. scikit-learn Machine Learning in Python . scikit-learn developers. 2016. url : http : / / scikit - learn . org / stable/ (visited on 07/28/2017).[dev16b] scikit-learn developers. About us History . scikit-learn developers. 2016. url : http: / /scikit- learn. org/ stable/ about . html (visited on07/28/2017).[Max16a] Blue Yonder GmbH Maximilian Christ. License . Maximilian Christ,Blue Yonder GmbH. 2016. url : http://tsfresh.readthedocs.io/en/latest/license.html (visited on 06/30/2017).[CKF16] Maximilian Christ, Andreas W Kempa-Liehr, and Michael Feindt.“Distributed and parallel time series feature extraction for industrialbig data applications”. In: arXiv preprint arXiv:1610.07717 (2016).[BB17] Thibault de Boissiere and Nils Braun. Feature ﬁltering . GitHub, Inc.Mar. 31, 2017. url : https : / / github . com / blue - yonder / tsfresh / blob / master / docs / text / feature _ filtering . rst (visited on06/30/2017). 41FJ14] Ben D. Fulcher and Nick S. Jones. “Highly Comparative Feature-BasedTime-Series Classiﬁcation”. In: 26, 12. EEE TRANSACTIONS ON,2014, pp. 3026–3037. url : http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6786425 (visited on 07/27/2017).[Nun+15] Isadora Nun et al. “FATS: Feature Analysis for Time Series”. In: arXivpreprint arXiv:1506.00010 (2015).[Max16b] Blue Yonder GmbH Maximilian Christ. tsfresh.feature_extraction pack-age . tsfresh. Oct. 26, 2016. url : http://tsfresh.readthedocs.io/en / latest / api / tsfresh . feature _ extraction . html (visited on07/04/2017).[Fu11] Tak-chung Fu. “A review on time series data mining”. In: EngineeringApplications of Artiﬁcial Intelligence . Python Software Foundation. July 27, 2017. url : https://docs.python.org/3/library/pickle.html (visitedon 07/27/2017).[Fou17c] Python Software Foundation. . Python Software Foundation. July 26, 2017. url : https : / / docs . python . org / 2 / library / multiprocessing . html (visited on 07/26/2017).[dev16c] scikit-learn developers. . scikit-learn developers. 2016. url : http://scikit-learn.org/stable/modules/preprocessing.html (visited on 08/02/2017).[Nag+08] Jea Nagi et al. “Detection of abnormalities and electricity theft us-ing genetic support vector machines”. In: TENCON 2008-2008 IEEERegion 10 Conference . IEEE. 2008, pp. 1–6.[Nag+10] Jawad Nagi et al. “Nontechnical loss detection for metered customersin power utility using support vector machines”. In:

IEEE transactionson Power Delivery

IEEE Transactionson power delivery

Computing Thoughts Decorators I: Introduction to PythonDecorators . Artima, Inc. Oct. 18, 2008. url : (visited on 07/17/2017).[Max17] Blue Yonder GmbH Maximilian Christ. How to add a custom feature .tsfresh. May 28, 2017. url : http : / / tsfresh . readthedocs . io /en / latest / text / how _ to _ add _ custom _ feature . html (visited on07/17/2017).[dev16d] scikit-learn developers. . scikit-learn developers. 2016. url : http : / / scikit - learn . org / stable / modules / cross _ validation . html (visited on07/30/2017).[dev16e] scikit-learn developers. . scikit-learn developers. 2016. url : http://scikit-learn.org/stable/modules/grid_search.html (visited on 07/30/2017).[dev16f] scikit-learn developers. Precision-Recall . scikit-learn developers. 2016. url : http://scikit- learn.org/stable/auto_examples/model_selection/plot_precision_recall.html (visited on 07/30/2017).[Roe+05] Byron P Roe et al. “Boosted decision trees as an alternative to artiﬁcialneural networks for particle identiﬁcation”. In: Nuclear Instrumentsand Methods in Physics Research Section A: Accelerators, Spectrome-ters, Detectors and Associated Equipment