Driving with Data in the Motor City: Mining and Modeling Vehicle Fleet Maintenance Data
Josh Gardner, Jawad Mroueh, Natalia Jenuwine, Noah Weaverdyck, Samuel Krassenstein, Arya Farahi, Danai Koutra
DDriving with Data in the Motor City: Understandingand Predicting Fleet Maintenance Patterns
Josh Gardner ∗ , Jawad Mroueh † , Natalia Jenuwine † , Noah Waverdyck † ,Samuel Krassenstein ‡ , Arya Farahi † , Danai Koutra †∗ University of Washington; [email protected] † University of Michigan; [jmroueh, najenu, nweaverd, aryaf, dkoutra]@umich.edu ‡ City of Detroit - Operations and Infrastructure Group
Abstract —The City of Detroit maintains an active fleet of over2500 vehicles, spending an annual average of over $5 millionon purchases and over $7.7 million on maintenance. Modelingpatterns and trends in this data is of particular importanceto a variety of stakeholders, particularly as Detroit emergesfrom Chapter 9 bankruptcy, but the structure in such data iscomplex, and the city lacks dedicated resources for in-depthanalysis. The City of Detroit’s Operations and InfrastructureGroup and the University of Michigan initiated a collaborationwhich seeks to address this unmet need by analyzing data fromthe City of Detroit’s vehicle fleet. This work presents a case studyand provides the first data-driven benchmark, demonstrating asuite of methods to aid in data understanding and predictionfor large vehicle maintenance datasets. We present analyses toaddress three key questions raised by the stakeholders, relatedto discovering multivariate maintenance patterns over time;predicting maintenance; and predicting vehicle- and fleet-levelcosts. We present a novel algorithm, PRISM, for automatingmultivariate sequential data analyses using tensor decomposition.This work is a first of its kind that presents both methodologiesand insights to guide future civic data research. I. I
NTRODUCTION
On July 18, 2013, the City of Detroit (hereafter, simplyDetroit) filed for Chapter 9 bankruptcy and initiated a recoveryplan. The recovery plan includes major investments to updatethe police, fire, and emergency medical services departmentsand their fleets. Under this plan, the city is investing approxi-mately $447M over the next 10 years for the replacement andmodernization of vehicle fleets and facilities. Detroit managesand maintains a fleet consisting of over 2500 active vehicles,with four shops, six fuel sites, and 70 technicians to maintainthe fleet. These vehicles are particularly critical to servicedelivery in the city, which has its population of over 672,000spread over 139 square miles—an area larger than the City ofPhiladelphia with less than half of the population density. Detroit spent an annual average of $7.7M on maintenanceand over $5M on new vehicle purchases between 2010 and2017. Historical maintenance and purchase data can beutilized to efficiently allocate resources during the recovery For reproducibility details, including hyperparameter settings, software,and links to complete PARAFAC results, see the arXiv version of this paper. These figures are based on the data used in this work.
Fig. 1: Vehicle fleet maintenance in Detroit.effort. However, Detroit, like most municipalities, struggleswith insufficient financial resources and capacity to analyzehistorical data and provide data-driven insights for decision-makers.To fill this gap, the University of Michigan partnered withDetroit’s Operations and Infrastructure Group. This collab-oration has the dual goal of providing methods for data understanding and prediction , driven by three key researchquestions: (RQ1)
How can we uncover, validate, and interpretcomplex, multivariate patterns from fleet maintenance records? (RQ2)
Can we predict required vehicle maintenance? (RQ3)
Can we predict vehicle- and fleet-level maintenance costs?Answering these questions provides methods and inter-pretable algorithmic insights which will allow the city to betternavigate the complex logistical and financial decisions allmunicipal governments face, including: optimize the allocationof existing resources; improve service delivery; reduce costs,fraud, and erroneous data; and make informed decisions aboutmaintenance scheduling and future investments. For instance,when a vehicle is being repaired, it is unavailable for use, andis a stranded asset that reduces the city’s capacity to deliverservices. To ensure that the necessary types of vehicles areavailable when needed, the city must always maintain a surplusof vehicles, which result in added cost. The analyses in thiswork can address these issues: a multivariate analysis identi-fies common system repair patterns over time which assiststechnicians and analysts in understanding the fleet, informstechnician hiring and allocation, and guides future vehicle de-ployment and procurement decisions; a predictive maintenancemodel proactively identifies necessary maintenance and can beused to optimize vehicle downtime, fleet availability, and joballocation across technicians and garages; and finally a costforecasting model informs budgeting, resource allocation, andinvestment decisions.We address our research questions by developing and ap- a r X i v : . [ c s . C Y ] S e p lying algorithms for multidimensional pattern extraction. Ourmain contributions are summarized as follows: • Novel Study:
Vehicle maintenance data has not beenevaluated in prior published data mining research. Ourstudy sets a precedent for future research in this domainand provides the first data-driven approach. • Descriptive Analysis:
We use tensor decomposition anddifferential sequence mining, including the novel PRISMalgorithm which presents a unified Bayesian approachthese tasks, to discover complex vehicle-system-time re-pair patterns and their characteristic subsequences ( § III).PRISM is the first algorithm to explicitly leverage thesequential nature of data modeled using the parallelfactors decomposition (PARAFAC). • Predictive Analysis:
We leverage sequence neural net-works to predict police vehicle maintenance and performtime series modeling to forecast vehicle- and fleet-levelcost ( § IV). • Guidelines & Reproducibility:
We describe the chal-lenges of data and analysis in real-world public-sectorcontexts and conclude with the lessons learned from ourpartnership ( § VI). While a non-disclosure agreementwith the City of Detroit prevents us from making thedata publicly available, we release our code publiclyso other municipalities and researchers can reproducethis work with their own data: https://github.com/jpgard/driving-with-data-detroit/.II. D
ATASET
We analyze a comprehensive dataset of the entire Detroit-owned vehicle fleet and their maintenance jobs, provided bythe Operations and Infrastructure Group in the City of Detroit.The records contain a mix of data transferred from priorpaper records (with the oldest vehicle records dating to 1944)and those entered by new electronic record-keeping systems.Data entry is performed by several stakeholders, includingmaintenance technicians, managers, and analysts. The dataconsists of two tabular data sources.The vehicles table (Table I) consists of records, one pervehicle, representing every known vehicle currently or previ-ously owned by Detroit. The table has information about eachvehicle’s manufacture, purchase, and use. It tracks data forpolice cars, garbage trucks, freight trucks, ambulances, boats,motorcycles, mowers, and other vehicles. The maintenancetable (Table II) consists of job-level records for every in-dividual maintenance job performed on any vehicles ownedby Detroit. It includes everything from routine inspections,tire changes, and preventive maintenance to major collisionrepairs, glass work, and engine replacements.Together, these tables form a detailed, job-level datasetof maintenance on Detroit’s entire vehicle fleet across 87different departments, such as police, airport, fire, and solidwaste. The records in each table are entirely complete (nofields are missing in any record). The data is, however, proneto noise, as often manually recorded by vehicle techniciansat maintenance time (e.g., odometer readings fluctuated and TABLE I: Description of the vehicles table.
Field Description Example
Unit
TABLE II: Description of the maintenance table.
Field Description Example
Job ID Unique identifier for job 847956Year Completed Year of completion 2017Unit No Vehicle identifier 067602Work Order No Unique identifier for work order 635864Open Date Work Order Open 2017-01-17Completed Date Work Order Completion 2017-01-17Work Order Loc. Location of work order CODRFJob Open Date Job Open 2017-01-17Job Reason Job reason code BJob Reason Desc Job reason description BREAKDOWN/REPAIRCompleted Date Date Job Completed 2017-01-17Job Code Job ID 24-13-000Job Description Detailed description of job REPAIRBrakesLabor Hours Hours of labor completed on job 6.35Actual Labor Cost Total cost of labor for job $348.16Commercial Cost Commercial (non-city) labor $0Part Cost Cost of parts for job $57.55Primary Meter Odometer at repair time (mi) 48250Job Status Status code; DON = Done DONJob WAC Job type code 24WACDescription Job type description REPAIRJob System Code for vehicle system repaired 13Syst. Descr. Vehicle system repaired BrakesJob Location Location of job completion CODRF sometimes even decreased between repairs) or “lifetime todate” statistics such as fuel consumption; hence there arepotential concerns about the accuracy of some data due tohuman data-entry, job categorization errors, or data omittedfrom the electronic records. To minimize the impact of theseuncertainties and utilize the most reliable data, following therecommendation of experts who are familiar with the data, welimit our analysis to maintenance records from 2010 or later,as Detroit’s fleet data collection practices changed in 2010(new electronic record-keeping system). This represents 1,087active vehicles and over 25,000 maintenance records.II. A
UTOMATED M ULTIVARIATE S EQUENCE A NALYSISWITH
PRISMWe begin by addressing (RQ1): how can we uncover,validate, and interpret complex, multivariate patterns fromfleet maintenance records?
Our aim is to identify meaningfulmultivariate maintenance patterns in the Detroit vehicle fleet,and to do so in a way that requires minimal human inputand tuning so as to enable ongoing, automated analysis ofmaintenance event streams. We carefully design an algorithmthat satisfies the following conditions: (i) the model is capableof extracting meaningful patterns from the fleet data withminimal tuning, (ii) the output is interpretable for a layperson,and (iii) the practitioners in the city can readily run the modelwhen new data become available, needing minimal user in-tervention. To meet these requirements, we utilize PARAFACas the foundation of this analysis, and then develop a novelalgorithm, P A R AFAC -Informed Sequence Mining (PRISM), toidentify “characteristic subsequences” unique to multivariategroupings identified by PARAFAC. PRISM assists in makingthe multidimensional patterns revealed by PARAFAC inter-pretable and actionable when applied to sequential data.
A. Methodology1) Data Model:
Our goal is to encode the informationof the entire fleet into a single dataset that will enable thediscovery of meaningful fleet-level patterns. The multidimen-sional data described in § II can be naturally represented astensors, or n -way arrays [1]. Specifically, we model the Detroitvehicle maintenance dataset as vehicle × system × time datatensors. An illustration of a resulting 3-way tensor is shownin Figure 2, where the vertical axis (the first mode ) representseach different vehicle, sorted by year and unit number; thehorizontal axis (the second mode ) represents each distinctvehicle system (see “System Description” in Table II); andthe depth ( third mode ) represents time in months or years. Thevalue at any given [ vehicle, system, time ] entry in the tensoris the count of maintenance jobs for that particular vehicle,system, and time.We note that in our data representation we do not at-tempt to separate different vehicle types and analyze themindependently, as this type of user intervention drifts awayfrom our goal of a fully automated data analysis pipeline.Most importantly, by grouping vehicles, there could be lossof information at the fleet level. A well-behaved algorithmshould be able to find patterns at both the type- and fleet-level. In the following subsections, we demonstrate that bothkinds of patterns are discovered through PARAFAC + PRISM.
2) PARAFAC Decomposition:
The PARallel FACtors(PARAFAC) decomposition is a higher-dimensional analog tothe SVD, used for tensors in > dimensions [1]. PARAFACdecomposes a tensor into a sum of component rank-one tensorswhich best reconstruct the original tensor. For example, givena 3-way tensor X ∈ R I × J × K , PARAFAC decomposes thetensor as X ≈ (cid:80) Rr =1 a r ◦ b r ◦ c r , where a r ∈ R I , b r ∈ R J , c r ∈ R K for r = 1 , . . . , R and “ ◦ ” represents the vectorouter product. The PARAFAC decomposition can be written Fig. 2: PARAFAC decomposes a vehicle × system × time tensor into products of vehicle, system, and time factor vectors.compactly as the combination of three loading matrices A , B , C : X ≈ [ A I × R , B J × R , C K × R ] , in which the r th columnscorrespond to the vectors a r , b r and c r , respectively. Theseencode the most “important” relationships between differentdimensions (or modes) of the tensor. For more informationabout PARAFAC, see [1], [2]; details on our PARAFACexperiments are given in C-A.The key aspect of the PARAFAC decomposition that makesit useful for understanding the Detroit vehicle-maintenancedataset is that it identifies R groupings (factors) of differentvehicles, systems, and times, as well as factor loading vectors a r , b r and c r which identify how strongly each vehicle , system , and time contributes to this factor. Limitations of PARAFAC:
There are several limitations tousing PARAFAC alone to identify multivariate patterns: (a)
PARAFAC does not identify the individual observations in each factor. PARAFAC only yields R multivariate loadingvectors a r , b r and c r indicating the degree to which eachfactor correlates with each index along each mode of the data.It is not clear how to utilize this information in downstreamanalysis beyond visualization of these vectors directly, as inFigures 3 and 4. As a result of this limitation, we cannotanswer the question: to which [ vehicle, system, time ] obser-vations does factor r apply (or not apply)? This prevents,for example, searching for vehicles or maintenance recordsfalling under a specific factor. As a result of this limitation,we cannot provide technicians with a list of vehicles in aspecific PARAFAC factor for further inspection or repair,nor can we compute the total cost of maintenance withina given PARAFAC factor to share with fleet managers orpolicymakers.While sparsity-inducing PARAFAC decomposition algo-rithms exist, in this application, we do not actually haveprior knowledge that the underlying structural relationshipsare indeed sparse. Imposing sparsity constraints may leadto incorrect conclusions. Vehicle maintenance data reflectscomplex relationships between vehicles, systems, and time,which may not match the assumptions of a sparsity-inducingPARAFAC. Instead, we desire a solution which imposesminimal assumptions on the data while still allowing forinference about the in- and out-groups in each resulting factorcomponent for downstream analysis. (b) PARAFAC does not directly leverage the sequentialnature of the data. PARAFAC only uses the frequency of [ vehicle, system, time ] triplets in the data tensor. Due to thislimitation, we cannot identify the specific sequences from thenderlying data that give rise to the high loadings in eachfactor r , and cannot answer the question “what observedmaintenance subsequences in the original data give rise tofactor r ?” As an example, the PARAFAC loading vectorswould not differentiate between the sequences “Accident,Brakes, Brakes“ and “Brakes, Brakes, Accident”, but thesesequences lead to different hypotheses about underlying fleetmaintenance issues in a factor grouping (the first implies ac-cidents frequently result in brake damage; the second impliesbrake issues frequently precede accidents).Extracting these sequences requires manual interpretationof the results, which can be both labor-intensive and ad hoc:users must attempt to discern which vehicles, systems, andtimes each factor applies to (using three-way plots), and thenundertake a separate analysis of the repair sequences for thosevehicle-system-time combinations.There is no existing methodology to address this limitationof PARAFAC for sequential data, despite the fact that manyprevious applications of PARAFAC also evaluate data whichis sequential in nature (e.g. text [3] and discourse [4] data).
3) Differential Sequence Mining (DSM):
Limitation (b) ofPARAFAC could be addressed via differential sequence min-ing (DSM), which identifies differences in sequences betweentwo groups. Existing methods for DSM rely on computingfrequent sequences in a group of interest (which we refer toas the “in-group” ), and comparing their frequency to anothergroup (the “out-group” ) using statistical tests. A commonmethod for DSM computes the i-ratio , | InGroup || OutGroup | , and uses a t -test to determine whether the observed i-ratio is statisticallysignificant [5]. However, several limitations of existing DSM methods makeit ineffective for the current application. First, DSM is onlyuseful if the first limitation of PARAFAC is solved: the i-ratiorequires a binary identification of whether each observationis “in” or “out” of a given PARAFAC factor. As mentionedabove, the only methods to do so would require imposingsparsity constraints on the resulting decomposition, which weseek to avoid. Second, the frequent pattern search algorithmused in DSM is based on overall frequency, without regard tothe “uniqueness” of those sequences to the in-group, and soyields little additional information. Third, its use of frequency yields results which are biased toward shorter subsequences.Finally, the extensive use of frequentist statistical significancetesting in DSM [5], where a t -test is applied to every subse-quence evaluated, can lead to spurious results and “statisticallysignificant” results which merely reflect large sample sizes,not large effect sizes [6]. This is the case even when mostcommonly-used corrections for multiple hypothesis testing(e.g. Bonferonni, Benjamini & Hochberg) are applied, as theseare only appropriate for small numbers of tests [7], whilethousands of subsequences are commonly evaluated in taskssuch as our case study below. In the context of large-scale dataanalysis where many subsequences (e.g. all n -grams of length In the original work, “in-group” and “out-group” are referred to as leftand right groups, respectively, but the meaning here is the same.
Algorithm 1
PRISM: executed on the factor loading matricesof each of the R PARAFAC factors. Input : a r , b r , c r : loading vectors for factor r ; seqs: listof vehicle maintenance sequences; and priors, γ , BDPTprior and ROPE. Output : ∆ θ seq : posterior difference in proportions; and P (∆ θ seq / ∈ ROP E ) : probability of practical difference inproportions for all frequent sequences in in-group. /* In practice the algorithm is not sensitive to the param-eters of either BGMM (e.g. γ ) nor to the choice of priorin BDPT as long as a weak, uninformative prior is usedand γ is not near the extremes of [0 , . */ initialization ( k = 2 , γ = , ROP E = 0 . ) /* S1 : Determine in-group observations per mode { a, b, c } using a Bayesian Gaussian Mixture Model (BGMM). */ for all LoadingM atrix in { a, b, c } do InGroup i ← BGMM(LoadingMatrix r , γ = ) /* S2 : Find high-frequency sequences for the in-groupvehicles. */ InGroupSeqs ← Filter(seqs, (InGroup a ) OutGroupSeqs ← Filter(seqs, ¬ ( InGroup a )) m = | InGroupSeqs | n = | OutGroupSeqs | InGroupFreqSeqs ← FindFreqSeqs(InGroupVehicleSeqs,InGroup a ) /* S3 : Conduct Bayesian Difference in Proportions Test(BDPT). */ for all seq in InGroupFreqSeqs do InGroupSupp ← (cid:80) InGroupFreqSeqs = seq
OutGroupSupp ← (cid:80) OutGroupFreqSeqs = seq [ ∆ θ seq , P ( θ / ∈ ROPE ) seq ] = BDPT(InGroupSupp,OutGroupSupp, m, n) ≤ ) may be evaluated to compare many different subgroups,the Type I Error rate of such tests breaks down [7]. P A R AFAC
Informed Sequence Mining (PRISM):
Moti-vated by our observations in § III-A2 and III-A3, we presentan algorithm, P A R AFAC -Informed Sequence Mining (PRISM),which jointly resolves the existing limitations of prior DSMalgorithms and includes the first unified, automated approach to link DSM to the results of a PARAFAC analysis. We give itspseudocode in Algorithm 1. At a high level, it consists of thefollowing steps for each PARAFAC component r = 1 . . . R : S1 A Bayesian Gaussian Mixture Model (BGMM) is used toidentify the “in-group” vehicles, systems, and time pointsfor a factor r (those to which this factor applies). Weuse a standard finite mixture model with k = 2 com-ponents, a Dirichlet distribution, and a standard weightconcentration prior of γ = k = , fit separately to eachfactor loading vector. The in-group for each dimensionis the mixture component with a larger posterior mean.In practice, this procedure separates observations withnear-zero and non-zero entries in a r , b r and c r quiteffectively, without much sensitivity to γ . We give moredetails in App. B-A. S2 Compute frequent sequences for the in-group vehicle-system-time set using a standard frequent sequence min-ing algorithm [8], and only keep sequences which containat least one in-group system. Normalize frequencies bythe total size of each group (i.e., total number of n -grams in in-group and out-group, respectively) to producea proportion. S3 Conduct a Bayesian difference-in-proportions test(BDPT) using a non-informative prior (e.g., Beta (1 , ,the weakest form of the conjugate prior for a binomialproportion) to determine the posterior probability ofwhether the proportion of the observed subsequences ineach group is the same. The resulting subsequences forwhich the posterior probability of a large difference inproportions between in-group and out-group vehicles isbelow some predetermined threshold (e.g., . ) are the“characteristic subsequences” of that factor. Replicationdetails are given in App. B-B.PRISM thus jointly resolves the limitations ofPARAFAC described above. S1 determines, for every [ vehicle, system, time ] maintenance record, whether it is“in” factor r or not. Then, S2 mines the “in-group” forfactor r to determine which maintenance sequences, for those [ vehicle, system, time ] records in the factor, are most uniqueto factor r . S3 ensures the identified sequences are bothstatistically significant and practically important by ensuringthat the posterior probability that the difference in proportionsis larger than ROPE is high, according to BDPT.PRISM provides a unified method for leveraging the valu-able data provided by the PARAFAC factor loading matrices A , B , C via sequence mining in order to identify “characteristicsubsequences” specific to the multidimensional loadings ofeach factor r . This information is not given by PARAFACalone. Furthermore, using a Bayesian framework for boththe clustering and, in particular, the statistical analysis ofsubsequences in DSM alleviates concerns about multiple hy-pothesis testing, as each iteration is simply estimating theposterior probability of a difference in relative frequencybetween the in- and out-groups, not the probability that wewould observe the data due to random chance under H , whichwould require controlling for Type I Error [9]. Additionally,instead of simply evaluating a point hypothesis (typically H : θ in = θ out ), the Bayesian test allows us to estimatethe probability that the difference in frequencies is outsideof a “region of practical equivalence”, or ROPE [10], whichexcludes what might otherwise be “statistically significant”,but practically useless, results in the case of small but genuinedifferences in frequency of occurrence. We discuss uses ofsuch sequences in Section III-B. B. Findings and Impact1) PARAFAC:
Setup.
There is no explicit methodology ofwhich we are aware for selecting R . In our analysis we set R =25 , but the results that we report are largely robust to different values of R . Our choice is consistent with the literature (see § V) and also leads to a manageable number of 3-way plots( × factors per our analysis) that can be easily inspectedby a civic data scientist. Details on the objective function,algorithm, and convergence of the PARAFAC model used hereare given in § C-A.First, we seek to identify multivariate vehicle-system-timerelationships in the Detroit dataset in a way that is automatedand interpretable, even for non-technical domain experts andcity stakeholders. To this end, we generate “3-way” plots ofthe three factor matrices from the PARAFAC decomposition[11] using the tensor toolkit provided by [12], [13], as shownin Figures 3-4 (top, white panels). Each plot visualizes thevectors a r , b r and c r , which show the different modes(vehicle, system, time) participating in the r th factor. Weexplore two different representations of time in the datatensors: one which uses absolute time (month and year) inFigure 3 and another using vehicle lifetime (by year, startingwith the vehicle’s purchase year) in Figure 4. The absolutetime analysis allows us to model seasonality and other real-time trends in fleet maintenance, and could be more useful inforecasting future maintenance. On the other hand, the vehiclelifetime analysis allows us to measure trends and changesin vehicles’ maintenance over the course of their lifetime inthe Detroit fleet, and could be useful for vehicle reliabilityanalyses. Findings.
Examples of the results from the absolute timeanalysis are shown in Figure 3. These results demonstrateclear patterns across vehicles, systems under repair, and time,underscoring the importance of this multivariate approach. Forexample, fire trucks and ambulances (the Terrastar Horton inleft column of Figure 3 and Smeal SST Pumper in the centercolumn of Figure 3, respectively) both show strong evidenceof patterns in their maintenance, but with very different groupsof systems and across different time bands. The riding mowershown in the right column of Figure 3, however, displays anentirely different maintenance pattern, with a focus on onlytwo systems (mowing blades and tires/tubes/liners) and strongseasonality, which reflects the seasonal use of mowers in anorthern city such as Detroit.Examples of the results from the PARAFAC vehicle lifetimeanalysis are shown in Figure 4. This analysis demonstrates adifferent set of patterns: those across the lifetime of vehicles,beginning when they are purchased. Note that the right columnof Figures 3 and 4 identify a nearly identical set of vehicles buthighlight different patterns, illustrating the different insightsgained from absolute time vs. lifetime analyses. Additionally,the center and right columns of Figure 4 are an examples ofvehicle-level maintenance patterns, while the left column ofFigure 4 is an example of fleet-level maintenance patternswhich is common across the entire fleet. This example il-lustrates that PARAFAC is indeed capable of automaticallydiscovering patterns at both vehicle and fleet level, as desired( § III-A1).Figures 3 and 4 show how patterns specific to certainig. 3: Top white Panel: PARAFAC 3-way plot of absolute-time analysis. Patterns involving the highlighted vehicles (top row)going under specific types of repairs (middle row) over select times (bottom row) are shown. Left column: Ambulance 2014Terrastar Horton vehicles involved in Body (B), Cab/Sheet Metal, Engine and Motor (EMS), and Preventive Maintenance (PM)services after 2014. Center column: Repair to specific systems of the Smeal SST Pumper (fire truck), from late 2015 through2016. Right column: System and time patterns for riding mowers, with repairs to mower blades and tires/tubes/liners/valves(LLTV) during seasons of high usage. Bottom gray Panel: A subset of the characteristic maintenance subsequences discoveredvia PRISM applied to the corresponding factor vectors.Fig. 4: Top white panel: PARAFAC 3-way plot of vehicle lifetime analysis. Left column: Simple pattern common to almost all vehicles: tires/tubes/valves/liners (TTLV) replacement during the second year of lifetime. Center column: The 2012 FreightlinerM2112V, a garbage truck, has increased maintenance in years 2-4 after purchase, focusing on hydraulics, lighting (LS), gaugesand warning devices, and cooling systems. Right column: Patterns primarily for the 2013 Hustler Z 60 2013 (a riding mower),which have mowing blades (M) serviced frequently in the second and third years of their lifetime. Bottom gray panel: A subsetof the characteristic maintenance subsequences discovered via PRISM applied to the corresponding factor vectors. departments are automatically uncovered by PARAFAC, eventhough departmental data was not provided in the input data toPARAFAC. We later also learned that the factors in Figures 3relating to ambulance and fire trucks were actually indicativeof specialist technicians working on those vehicles; again,PARAFAC revealed these unique multidimensional patternswithout preexisting knowledge.
2) PRISM:
Setup.
The PRISM algorithm allows us toleverage the PARAFAC loadings to extract further insightabout each group, by mining sequences which representspecifically the vehicle/system/time observations representedin each factor’s loading vectors a r , b r , c r . This analysis uses ROP E = 0 . , i.e., PRISM searches for subsequences whichhave high posterior probability of differing in normalized fre-quency by at least . between the in- and out-groups of anygiven factor according to BDPT (in most cases, the observeddifference is much larger). In Figure 3 and Figure 4, weadd a subset of the characteristic maintenance subsequences discovered via PRISM applied to the corresponding factorvectors. These are shown in the bottom gray panel below eachthree-way plot. The specific characteristic sequences presentedhere were selected from a larger set of overall PRISM resultsfor each factor. Findings.
The sequences identify concrete vehicle re-pair sequences which are uniquely common to the vehi-cle/system/time grouping in each factor. For example, wemight use the characteristic sequences to recommend brakeservice (B) whenever preventive maintenance (PM) is per-formed for the vehicles in the factors in the left and centercolumns of Figure 3 (mostly ambulance and fire truck), or torecommend lighting system repairs when PM is performed forvehicles in Figure 4b (garbage truck). Furthermore, PRISMprovides validation of the PARAFAC loadings, confirmingthat there are significant differences in the occurrence ofmaintenance patterns across the vehicle/system/time groupsidentified via PARAFAC. ) Impact:
The PARAFAC + PRISM analysis demonstratesthe variety of insights that can be gained from using ten-sor decomposition to understand multidimensional data. Theanalysis above uncovers multidimensional patterns across theentire Detroit vehicle fleet, as well as unique trends specificto certain vehicles, systems, and times. Additionally, the useof two different measures of time—month/year, and vehiclelifetime—allows us to demonstrate two different modes oftime-bound pattern in the data. These results suggest severalpotential actions for Detroit, including potential seasonal al-location of resources and technicians (e.g., for mower systemrepair during the summer time, as shown in the right column of3), and point to future efforts in detailed analyses of such datafor other purposes, such as anomaly detection and automatedfleet maintenance recommendation or scheduling systems.The PRISM algorithm provides, to our knowledge, thefirst principled method to automatically extract interpretableinformation from the results of PARAFAC and utilize it forsequence analysis. It has the potential to apply more broadlyto a variety of sequence mining tasks where the unsupervisedidentification of groups and their defining sequential patternsis desired. PRISM can specifically inform future work onpredicting vehicle maintenance, availability, and labor, parts,and other costs due to maintenance. It could also potentiallylead to changes in the city’s fleet maintenance operationsby providing interpretable visualizations to policymakers andvehicle mechanics, as well as providing suggested mainte-nance “bundles” for individual vehicles or groups of vehicleswhile they are in for repair, which could lead to economiesof scale and improved cost efficiency as the city works toemerge from its bankruptcy. Moreover, our methodology gen-eralizes to other domains where multidimensional, sequentialdata abound, including tasks to which PARAFAC has beenpreviously applied (see § V).IV. F
ORECASTING M AINTENANCE P ATTERNS
Our results in § III demonstrate the existence of vehicle-system-time maintenance patterns which could be exploitedby appropriate sequence models in order to address additionalneeds. Our task in this section is to leverage these patternsbuild a set of predictive models for a specific type of vehicles,unlike § III where our task was to uncover sequential mainte-nance patterns from the entire dataset. Specifically, we address(RQ2),
Can we predict vehicle maintenance? , and (RQ3),
Canwe predict vehicle- and fleet-level maintenance costs? . (RQ2)deals with the low-level details of maintenance prediction,and (RQ3) is a high-level prediction task that is critical forbudgeting in large, financially-strained municipalities such asDetroit.To address these questions, we construct two models, onefor each task, that predict the next item (maintenance jobor maintenance costs) in a time series for vehicles in thefleet, given a set of previous items. We illustrate that simple,standard models achieve good performance, implying thatthese tasks are highly amenable to data mining.
Data.
Per our stakeholders’ request, in this section we focuson Detroit’s police vehicles, consisting of Dodge Chargers,Chevrolet Impalas, and Ford Crown Victorias. Police vehicles,particularly in a large and budget-strained city such as Detroit,are critical to the city’s capacity to deliver services, andrepresent a substantial portion of vehicle usage, maintenance,and procurement costs. Using these vehicles as a case studyallows us to focus on identifying, modeling, and interpretingpatterns specific to police vehicles, while also demonstratingthe broader potential of our methods’ ability to answer thespecified questions for other vehicles in future analyses, orleveraging our open-source code for analysis of other domains.
A. Methodology1) Maintenance Sequence Forecasting:
We implement asequential model to predict vehicle maintenance using thesequential structure of maintenance patterns ( § III), whichcan be useful for resource allocation, technician hiring, or thepreparation of a data-driven budget proposal. Specifically, weutilize the Long Short-Term Memory (LSTM) neural network[14], a well-established model that reads over a sequence, oneitem at a time, and computes probabilities of the possiblevalues for the next item in the sequence. In theory, an LSTMis capable of learning long-distance dependencies across asequence [15].
Data Setup.
From the raw data, we assemble a datasetconsisting of the complete sequence of system repairs foreach vehicle. Each vehicle’s sequence is considered a sepa-rate observation. To assemble training, validation, and testingdatasets for the model, we use all data from the three vehiclespredominantly used as police cars in the Detroit fleet. Ideally, amodel would be fit on only a single vehicle type; however, dueto the relatively small number of vehicles available for training(329 total police vehicles), it was necessary to combinemultiple make/models. We train on a random subset of 50% ofvehicles, using 25% for model validation and 25% for testing.
Evaluation.
An effective model assigns high probability tounseen data and low probability to a repair job that does nothappen. Hence, we choose to assess the performance of ourmodel using average per-item perplexity, a common evaluationmetric for sequence models which evaluates the probabilityassigned to entire test sequences: e − N (cid:80) Ni =1 ln( p target i ) = e loss ,where N is the total number of observations and p target i is theprobability assigned to item i . Assigning a high probability totrue, unseen data is equivalent to achieving low perplexity. Baselines.
We compare the LSTM model to a baselinethat we call frequency-matched model . In this model, wefirst compute the frequency of item i over all sequencesin the training data. Then we use this frequency to assigna probability to each target observation in the test sample, p target i , and compute the perplexity score. Because there areno other maintenance prediction models in prior publishedwork, we also provide the perplexity score of our model ontwo external datasets. These results, along with the results ofour model, are shown in Figure 5. Perplexity
Our ModelFrequency-MatchedModelOur Model on the Penn TreebankOur Model on the Google Billion Words
Fig. 5: Performance of our model in predicting the probabilityof the next maintenance job in a sequence (green) vs. afrequency-matched model (red), plus the performance of ourmodel on external datasets (orange and yellow).
Model.
We implement the well-known LSTM architectureoriginally used in [14] because of its ability to model complexsequences while avoiding overfitting. The model is a 2-layerLSTM which reads over maintenance sequences in temporalorder, maintaining a window size of at most 20 observations.Detailed training hyperparameters are given in § C-B.
2) Maintenance Cost Forecasting:
We forecast mainte-nance costs for active police vehicles using an autoregressiveintegrated moving average (ARIMA) model. Recent work hasdemonstrated that ARIMA performs well even in comparisonwith other highly complex machine learning methods for timeseries data [16]. Moreover, it well-known theoretical propertiesand interpretability make it ideal for our analysis.
Data Setup.
All of our forecasts are in terms of averagemonthly cost per vehicle. The cost data includes frequent fluc-tuations caused by decommissioning and acquiring vehicles(see Figure 6, which makes the prediction task challenging.We use a monthly timescale as a balance between aggregatingenough data per time period to be sufficiently stable anddetecting variation on smaller timescales (e.g., seasonality).
Evaluation.
The forecast model is evaluated using predictionsof costs one and six months into the future. We evaluate themodel using its root mean squared error (RMSE), but we alsomonitor AIC and BIC during model fitting in order to selecthyperparameters.
Model.
Our models predict the average cost of an entire de-partment (police), or the average cost of a specific make/model (Dodge Charger, Crown Victoria). Each ARIMA model istrained on data from the first 24 months, and generatespredictions of the average cost per vehicle. Predictions aremade one month and six months into the future. The modelis then updated with the true average cost per vehicle fromthe 25 th month, and generates the next pair of forecasts. Thisis a standard training regime for autoregressive time seriesmodels. For the details of model training and final ARIMAhyperparameter settings, see C-C. B. Findings and Impact1) Maintenance Sequence Forecasting:
Figure 5 comparesthe performance of our LSTM model with the frequency-matched model in predicting the next item in a maintenance se-quence on the Detroit dataset. We also present the performanceof the same model on external datasets. Our model achievesan average test perplexity score of 15.7, demonstrating thateven this relatively simple, computationally lightweight modelwith a small dataset is able to achieve strong predictiveperformance, far better than the frequency-matched model’sperplexity of ± .For comparison, we note that the architecture used here hasalso achieved perplexity score of 23.7 on the Penn Treebankdataset and 24.3 on the Google Billion Words dataset [17].While our model’s low perplexity score should not be directlycompared to model performance on other corpora, because ofthe relatively low number of candidate items in the sequence– 81 unique systems in the entire vehicles dataset compared tomany thousands in text corpora – the reference indicates thatour model assigns probability scores with performance on parwith state-of-the-art language models.
2) Maintenance Costs Forecasting:
Figure 6 shows theresults of the cost forecasting models, along with the groundtruth costs. The models show good agreement with the actualobservations. For the department-level model (top of Figure 6),the RMSE in predicting average per-vehicle cost ranges from$38 to $49, increasing only gradually as the prediction distanceincreases from 1 to 6 months, suggesting that the model iscapable of making both short-term and medium-term predic-tions. For the vehicle-specific model (bottom of Figure 6),we show that the model is able to forecast costs for FordCrown Victorias and Dodge Chargers. The Charger predictionis particularly challenging given the small sample and the rapidfluctuation due to new Charger acquisitions during the periodof analysis.
3) Impact:
Our analysis indicates that it is possible toaccurately predict both future maintenance jobs and the av-erage future expenses, both of which are critical for plan-ning purposes. Specifically, we show that future vehiclemaintenance sequence can be predicted with high accuracyeven in a modestly-sized fleet (164 training observations).The predictions of the LSTM can be used, for example,to support automated maintenance scheduling, availability orcost forecasting based on maintenance predictions, dynamicallocation of technicians and budget, anomaly detection, andmany other applications which can ensure effective fleet-widemaintenance.Moreover, our vehicle- and department-level cost modelsdemonstrate that relatively accurate per-vehicle cost predic-tions (e.g. within 20-25% at the department level for predic-tions one and six months into the future) can be obtained usinga simple model and only 24 months of prior data—a historicalwindow which any municipality should have available. Thesemodels can support budgeting and cost projection for data-driven planning, as well as comparative analysis of the currentand projected future per-vehicle costs of different vehicleig. 6: Top: One-month (left; RMSE = $38.6) and six-month(right; RMSE = $49.3) cost forecasts for police department.Bottom: One-month cost forecast for police vehicles by model,Ford Crown Victorias (left; RMSE = $49) and Dodge Chargers(right; RMSE = $158). 68% confidence intervals shown.Ground-truth costs shown in black.models. Cost projections are important for informing futurepurchasing, maintenance, usage, and vehicle disposal deci-sions. They can also contribute to optimal fleet compositionprediction, which can allow Detroit to optimize the vehiclesdeployed for achieving service delivery and cost goals. Suchtasks can be particularly impactful as the city recovers frombankruptcy.Our analysis shows that even simple models (such asARIMA) have significant predictive power for vehicle fleetanalysis tasks. Future directions include utilizing the outputof the LSTM model in order to potentially further improvethe accuracy of ARIMA.V. R
ELATED W ORK
Our analysis is based on tensor decomposition and related tostudies on municipal vehicle fleets and municipal forecasting.
Tensor Analysis and Applications.
Tensor representations andvarious decompositions have found wide applications in a va-riety of domains, including psychometrics [18], epidemiology[19], modeling online discourse over time [3], [4], web search[20], and anomaly detection [11]. For a more detailed overviewof tensor decompositions see [1].
Municipal Vehicle Fleets Research.
While predictive analytics,data science, and their application to urban planning (alsoknown as urban informatics ) have dramatically expanded inrecent years, these techniques have seen only limited applica-tions to one of the largest and most substantial assets man-aged by many governments—their vehicles—and publishedresearch on the topic is surprisingly limited. Some state andlocal governments conduct, but rarely publish, fleet lifecyclereports and maintenance analyses [21] and fleet management[22], [23] mostly focused on cost reduction. Research on predictive maintenance has utilized on-boardvehicle data for maintenance prediction [24] and for evaluatingwinter maintenance [25]. There have been some applicationsof deep learning to vehicle data for e.g. identifying faultycomponents and vehicle damage from photos [26], but noprior work on mining or modeling fleet maintenance records.Other vehicle-related issues in urban areas have receivedsignificant research attention, including accident prediction[27] and traffic flow prediction and optimization [28]–[30].The authors are not aware of any prior research applying tensordecomposition or the other techniques used in the current workto municipal vehicle data.
Municipal Forecasting.
Prior work has explored forecastingtasks in other areas of municipal government, including predic-tions of water usage [31] and solid waste generation [32]. Priorwork has also examined the use of decision support systemsutilizing ARIMA and other time series models [33], but bud-getary forecasting is still widely considered an open problemin municipal government, largely due to the complexity of theinterests and constraints involved [34].VI. C
ONCLUSIONS AND D ISCUSSION
In this analysis, we describe the results of a data collab-oration with Detroit’s Operations and Infrastructure Group.This work applies methods to uncover maintenance-relatedpatterns relevant to three key research questions. Our keycontribution is to extract multidimensional maintenance pat-terns across the entire fleet using PARAFAC and the PRISMalgorithm, which identifies characteristic subsequences foreach PARAFAC factor (RQ1). We emphasize that the output ofthe PARAFAC algorithm is hardly interpretable. To alleviatethis shortcoming, we propose the PRISM algorithm that canextract interpretable results from PARAFAC factors. We thenmove on to predictive tasks, one low-level and one high-level. We build an accurate maintenance forecasting modelwhich predicts the next maintenance job using fewer than200 vehicles for training (RQ2). We conduct maintenancecost forecasting at department- as well as individual-levelvehicle (RQ3). We show that even simple, standard, highly-interpretable predictive models achieve good performance andprovide actionable insights to our partners in the City.To the best of our knowledge, this work provides the first data-driven baseline for future studies on applying datamining to municipal vehicle data. We set a precedent in thisdomain and publicly release our code to enable other citiesand organizations to replicate or extend this analysis on theirown fleet data.
Limitations.
As all empirical studies, our analysis has somelimitations. We highlight areas where our analysis was limitedby data issues, and where future practitioners and analystsought to direct data collection efforts. Future data collec-tion efforts should focus on: (i) improving the accuracy andgranularity of existing data, such as vehicle mileage and fuelconsumption, (ii) collecting additional data, including vehicledrivers, time, location, and “engine hours” (the total time aehicle is in use). Available metrics such as age and mileageare imperfect measures of usage of many vehicles, such aspolice vehicles which may simply idle for long periods oftime during police shifts in cold weather.
Challenges.
This collaboration demonstrates a small sam-ple of the insights that can be gained from detailed mul-tivariate analysis of municipal data, but it also illustratesseveral of the challenges of working with such data. Manyaspects of the data—its observational nature; overlappingor difficult-to-decipher descriptions; error and incompletenesswhich are likely systematic and non-random —underscore thechallenges of working with real-world municipal data oftengenerated as “data exhaust” and not with the express aim ofproviding insights or accurate measurements. Additionally, thedistance between our analytical team and the users generatingthe data (vehicle drivers, technicians, and clerical staff) high-lights how challenging it can be to understand data context.Despite the challenges, even basic insights garnered from asimilar analysis can yield significant improvements the statusquo for budget-strained municipalities with limited data analy-sis resources, such as Detroit, and the methods presented herehave the potential to apply to a much wider variety of applieddata science problems regarding municipal or vehicle fleetdata. This work will serve as a model for future municipal-academic research partnerships.A CKNOWLEDGEMENTS
This work is partially supported by National Science Founda-tion, grant IIS 1845491, and Army Young Investigator Award No.W911NF1810397. The authors recognize the support of MichiganInstitute for Data Science (MIDAS). We would like to thank theGeneral Services Department of the City of Detroit for bringing thisproject to our attention and making the data available for use. R EFERENCES[1] T. Kolda and B. Bader, “Tensor decompositions and applications,”
SIAMRev. , vol. 51, no. 3, pp. 455–500, 2009.[2] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrixfactorization,” in
Advances in neural information processing systems ,2001, pp. 556–562.[3] B. Bader, M. Berry, and M. Browne, “Discussion tracking in enron emailusing PARAFAC,” in
Survey of Text Mining II . Springer London, 2008,pp. 147–163.[4] E. Acar, S. C¸ amtepe, and B. Yener, “Collective sampling and analysisof high order tensors for chatroom communications,” in
Intell. and Sec.Informatics , May 2006, pp. 213–224.[5] J. Kinnebrew, K. Loretz, and G. Biswas, “A contextualized, differentialsequence mining method to derive students’ learning behavior patterns,”
JEDM , vol. 5, no. 1, pp. 190–219, 2013.[6] R. L. Wasserstein and N. A. Lazar, “The ASA’s statement on p-values:Context, process, and purpose,”
Am. Stat. , vol. 70, no. 2, pp. 129–133,Apr. 2016.[7] B. Efron and T. Hastie,
Computer Age Statistical Inference . CambridgeUniversity Press, 2016.[8] J. Wang, J. Han, and C. Li, “Frequent closed sequence mining withoutcandidate maintenance,”
IEEE TKDE , vol. 19, no. 8, pp. 1042–1056,2007. For example, technicians subjectively choose between several job codes:i.e., “Adjust brakes” vs. “Repair brakes” vs. “Overhaul brakes”; many oldervehicles and jobs are believed to be missing from this data. [9] A. Gelman, J. Hill, and M. Yajima, “Why we (usually) don’t have toworry about multiple comparisons,”
J. Res. Educ. Eff. , vol. 5, no. 2, pp.189–211, 2012.[10] J. Kruschke, “Bayesian assessment of null values via parameter estima-tion and model comparison,”
Perspect. Psychol. Sci. , vol. 6, no. 3, pp.299–312, May 2011.[11] D. Koutra, E. E. Papalexakis, and C. Faloutsos, “TensorSplat: Spottinglatent anomalies in time,” in
PCI , 2012, pp. 144–149.[12] B. Bader and T. Kolda, “Efficient MATLAB computations with sparseand factored tensors,”
SIAM J. Sci. Comput. , vol. 30, no. 1, pp. 205–231,Dec. 2007.[13] B. W. Bader and T. G. Kolda, “MATLAB tensor toolbox version 2.0,”2006.[14] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural networkregularization,” Sep. 2014.[15] S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long time lagproblems,” in
NIPS 9 , 1997, pp. 473–479.[16] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “Statistical andmachine learning forecasting methods: Concerns and ways forward,”
PLoS One , vol. 13, no. 3, 2018.[17] O. Kuchaiev and B. Ginsburg, “Factorization tricks for LSTM networks,”2017.[18] J. Douglas Carroll and J.-J. Chang, “Analysis of individual differences inmultidimensional scaling via an n-way generalization of “Eckart-Young”decomposition,”
Psychometrika , vol. 35, no. 3, pp. 283–319, 1970.[19] Y. Sakurai, Y. Matsubara, and C. Faloutsos, “Mining and forecastingof big time-series data,” in
SIGMOD ’15 , ser. SIGMOD ’15, 2015, pp.919–922.[20] J. Sun, H. Zeng, H. Liu, Y. Lu, and Z. Chen, “CubeSVD: A novelapproach to personalized web search,” in
WWW . ACM, 2005, pp.382–390.[21] D. D. Gransberg and E. P. O’Connor, “Major equipment life-cycle costanalysis,”
Minnesota Department of Transportation , 2015.[22] E. B. Osborne, “From measurement to management: A Performance-Based approach to improving municipal fleet operations in burlington,north carolina,” Master’s thesis, The University of North Carolina atChapel Hill, Apr. 2012.[23] P. T. Lauria and D. T. Lauria,
State Department of Transportation FleetReplacement Management Practices . Trans. Res. Board, 2014.[24] R. Prytz, “Machine learning methods for vehicle predictive maintenanceusing off-board and on-board data,” Ph.D. dissertation, Halmstad Uni-versity, 2014.[25] C. Lee, W. Loh, X. Qin, and M. Sproul, “Development of new perfor-mance measure for winter maintenance by using vehicle speed data,”
TRR , vol. 2055, pp. 89–98, 2008.[26] K. Singh and M. Arat, “Deep learning in the automotive industry: Recentadvances and application examples,” 2019.[27] N. Levine, K. E. Kim, and L. H. Nitz, “Spatial analysis of honolulumotor vehicle crashes: I. spatial patterns,”
Accid. Anal. Prev. , vol. 27,no. 5, pp. 663–674, Oct. 1995.[28] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Optimized andmeta-optimized neural networks for short-term traffic flow prediction:A genetic approach,”
Transp. Res. Part C: Emerg. Technol. , vol. 13,no. 3, pp. 211–234, Jun. 2005.[29] W. Zheng, D. Lee, and Q. Shi, “Short-Term freeway traffic flowprediction: Bayesian combined neural network approach,”
J. Transp.Eng. , vol. 132, no. 2, pp. 114–121, Feb. 2006.[30] Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Y. Wang, “Traffic flow predictionwith big data: A deep learning approach,”
IEEE Trans. ITS , vol. 16,no. 2, pp. 865–873, 2015.[31] S. Campisi-Pinto, J. Adamowski, and G. Oron, “Forecasting urban waterdemand via Wavelet-Denoising and neural network models. case study:City of syracuse, italy,”
Water Resour. Manage. , vol. 26, no. 12, pp.3539–3558, 2012.[32] N. Johnson, O. Ianiuk, D. Cazap, L. Liu, D. Starobin, G. Dobler, andM. Ghandehari, “Patterns of waste generation: A gradient boostingmodel for short-term waste prediction in new york city,”
Waste Manag. ,vol. 62, pp. 3–11, 2017.[33] H. Rego, A. B. Mendes, and H. Guerra, “A decision support system formunicipal budget plan decisions,” in
New Contributions in InformationSystems and Technologies , ser. Advances in Intelligent Systems andComputing. Springer, Cham, 2015, pp. 129–139.[34] J. P. Forrester, “Multi-year forecasting and municipal budgeting,”
PublicBudget. Finance , 1991.
PPENDIX AO NLINE S UPPLEMENTARY R ESULTS
The full set of results for the PARAFAC analysis appliedto our dataset, consisting of all R = 25 three-way plots forboth the absolute-time and the vehicle-lifetime analysis, areavailable in the git repository published with this work. • Absolute-Time Analysis: https://github.com/jpgard/driving-with-data-detroit/tree/master/img/3 way plots/month year log • Vehicle-Lifetime Analysis: https://github.com/jpgard/driving-with-data-detroit/blob/master/img/3 way plots/vehicle year log/README.mdA
PPENDIX BA LGORITHMS
A. Bayesian Gaussian Mixture Model (BGMM)
For estimating the in-group for each component of eachfactor using the loading vectors a r , b r , c r , we use a two-component Bayesian Gaussian Mixture Model (BGMM). Foreach PARAFAC factor r , the BGMM is fit directly to thesingle-valued vectors a r , b r , c r . the BGMM is used to assignbinary labels to each observation labeling it as either in-groupor out-group for a given factor r , where the in-group is thecluster with the higher posterior mean. Validation of the modelby detailed inspection demonstrated that BGMM achieved theintended result of largely forming clusters of near-zero andnon-zero observations.We use a standard finite mixture model from scikit-learn with two components and a Dirichlet distribution and a stan-dard weight concentration prior of γ = , but we note thatthe model was largely insensitive to the value of γ used dueto the relatively clean separation of most vectors into zero andnon-zero values. B. Bayesian Difference in Proportions Test (BDPT)
This section describes the Bayesian Difference in Propor-tions Test (BDPT) in detail. The aim of BDPT is to determinewhether there is a true and practically significant difference inthe frequency of occurrence of an event between two disjointpopulations. The BDPT is implemented with the followinghierarchical Bayesian model: θ i ∼ Beta (1 , (1) y i ∼ Binomial ( n i , θ i ) (2)where i denotes two groups of interest (InGroup or OutGroup), n i indicates the number of observations in each group, and θ i indicates the Beta variable drawn in (1). This model is used toestimate both the difference in the probability of occurrencebetween the two groups, θ InGroup − θ OutGroup , and also theprobability that this difference is larger than a prespecifiedRegion of Practical Equivalence, or ROPE [10], which isequivalent to estimating P (cid:16) θ InGroup − θ OutGroup / ∈ ROPE (cid:17) . (3) We implement this test using the Python package pymc3 ,using two chains of 2000 MCMC samples each with a burn-in period to perform posterior inference. This relatively smallsampling was determined to be acceptable given the simplemodel, which achieved good MCMC convergence.A PPENDIX CM ODEL I MPLEMENTATION AND H YPERPARAMETERS
A. PARAFAC
We use the PARAFAC implementation in the MATLABTensor Toolbox. Specifically, we utilize the cp_nmu() func-tion to compute the PARAFAC decomposition, which imple-ments the NMF algorithm of [2]. This algorithm uses a muti-plicative update to minimize the reconstruction error betweena data matrix X and its reconstruction P by minimizing thesquare of the Euclidean distance between X and P , solvingthe problemmin P || X − P || = min P (cid:88) ij ( X ij − P ij ) (4)where P is a nonnegative factorization of the matrix X .For our experiments, we use a tolerance of − and amaximum of 500 iterations; however, with R = 25 factors, thetolerance is reached in far fewer than the maximum numberof allowed iterations.Figure 7 shows convergence and fit diagnostics for thePARAFAC model. The upper panel shows a goodness-of-fitmetric, − (cid:112) σ max ( X ) + σ max ( P ) − · (cid:104) X, P (cid:105)|| X || (5)where σ max ( · ) indicates the largest singular value of a matrix,and X and P indicate the data matrix and the PARAFACreconstruction, respectively, as computed by [13]. Note thatthe maximum possible value of this metric is 1, indicating aperfect reconstruction, although what qualifies as an acceptablevalue of this metric is application-dependent.The lower panel of Figure 7 shows the change in (5) overiterations. While the PARAFAC algorithm is only guaranteedto converge to a local minima [2] and global optimalitycannot be guaranteed, our results indicate smooth and stableconvergence. B. LSTM Sequence Prediction Model
Our LSTM model is a 2-layer LSTM which considers upto 20 previous items in the sequence, if they exist, whenpredicting the next job. This model uses a 200-dimensionaldense representation of the input features, which allows it tolearn about relationships between repairs to different systems.The model uses the following hyperparameters: • Gradient descent optimizer; initial learning rate = 1 . . • Learning rate decay by factor of 0.5 after completion ofthe first 4 epochs. • Context window size = 20 • Hidden unit size = 200 • Batch size = 20 • Max gradient norm = 5 . ig. 7: Top: PARAFAC goodness-of-fit metric (5) over trainingiterations. Bottom: Convergence measured by change in (5)over iterations.The model is implemented in Tensorflow 1.x. Training onour dataset completes in less than 10 minutes on a standardlaptop CPU. C. ARIMA Cost Forecasting Model