[PDF] Data-driven optimization of processes with degrading equipment

Abstract

In chemical and manufacturing processes, unit failures due to equipment degradation can lead to process downtime and significant costs. In this context, finding an optimal maintenance strategy to ensure good unit health while avoiding excessive expensive maintenance activities is highly relevant. We propose a practical approach for the integrated optimization of production and maintenance capable of incorporating uncertain sensor data regarding equipment degradation. To this end, we integrate data-driven stochastic degradation models from Condition-based Maintenance into a process level mixed-integer optimization problem using Robust Optimization. We reduce computational expense by utilizing both analytical and data-based approximations and optimize the Robust optimization parameters using Bayesian Optimization. We apply our framework to five instances of the State-Task-Network and demonstrate that it can efficiently compromise between equipment availability and cost of maintenance.

Full PDF

DData-driven optimization of processes with degrading equipment

Johannes Wiebe , Inês Cecílio , and Ruth Misener Department of Computing, Imperial College London, London, UK Schlumberger Cambridge Research, Cambridge, UK

Abstract

In chemical and manufacturing processes, unit failures due to equipmentdegradation can lead to process downtime and signiﬁcant costs. In this con-text, ﬁnding an optimal maintenance strategy to ensure good unit health whileavoiding excessive expensive maintenance activities is highly relevant. We pro-pose a practical approach for the integrated optimization of production andmaintenance capable of incorporating uncertain sensor data regarding equip-ment degradation. To this end, we integrate data-driven stochastic degradationmodels from Condition-based Maintenance into a process level mixed-integeroptimization problem using Robust Optimization. We reduce computationalexpense by utilizing both analytical and data-based approximations and opti-mize the Robust Optimization parameters using Bayesian Optimization. Weapply our framework to ﬁve instances of the State-Task-Network and demon-strate that it can eﬃciently compromise between equipment availability andcost of maintenance.

Most technical processes contain equipment which degrades over time due to its us-age. Degradation may lead to serious equipment failures, unless preventive main-tenance actions are scheduled regularly to restore equipment conditions. Whilefrequent preventive maintenance can keep equipment availability high, it also incurssigniﬁcant cost. At the same time, unexpected equipment failures can lead to lossof production and high corrective maintenance costs. Finding the optimal bal-ance between preventive and corrective maintenance is diﬃcult, because degradation1 a r X i v : . [ m a t h . O C ] J a n ends to be at least partially random and the health state of equipment can often onlybe estimated from data subject to uncertainty. To make things worse, schedulingmaintenance activities is not independent from production planning and scheduling.A unit undergoing maintenance might, for example, be unavailable for production.Furthermore, the state of equipment health tends to depend not only on the selectedmaintenance strategy, but also on the process operating strategy. Operating a pro-cess with a high throughput might enable higher production volumes and more sales,but could also cause more equipment degradation and therefore a higher maintenancecost. These interactions between process conditions, maintenance strategy, and theequipment’s uncertain state of health make ﬁnding optimal and compatible mainte-nance and operating strategies a very challenging data-driven optimization problemunder uncertainty.One way of reducing the equipment maintenance cost is to determine mainte-nance schedules based on information regarding the equipment’s state of health col-lected through condition monitoring . This is the Condition-based maintenance (CBM) paradigm . Due to the increased availability of cheap sensors and therebylarge quantities of system health data, CBM is becoming more attractive . Whilemuch attention has been paid to data collection & processing and prognostic model-ing, the objective is usually to minimize the cost of maintaining a single unit . Thismeans that interaction between maintenance strategies for each piece of equipmentand the operating strategy of the entire process has largely been neglected.However, these interactions have been considered by multiple authors in the con-text of integrated maintenance scheduling and process optimization . Earlyapproaches in this ﬁeld which explicitly model degradation assume constant, knownreliability or decay curves . Dedopoulos and Shah , combine short-term stochas-tic scheduling with long-term maintenance scheduling in a two-step procedure, whileVassiliadis and Pistikopoulos determine optimal availability thresholds at whichmaintenance should be performed. Georgiadis et al. optimize the cleaning and en-ergy management of heat exchanger networks subject to fouling, which is assumed tofollow a known proﬁle. Liu et al. consider scheduling of maintenance and biophar-maceutical batch production with a deterministic performance decay. Xenos et al. optimize maintenance and production scheduling of a compressor network. Thepower consumed by the compressors is assumed to increase linearly with operatingtime (since maintenance was performed) due to fouling. Zulkaﬂi and Kopanos , develop an optimization framework for simultaneous operational planning and main-tenance scheduling of production and utility systems. They consider extra energycosts caused by performance degradation. The degradation is assumed to dependon operating time and the production rate. Aguirre and Papageorgiou consider2ntegrated planning, scheduling and maintenance under schedule-dependent, deter-ministic performance decay. Rajagopalan et al. analyze turnaround reschedulingand apply stochastic programming to manage unplanned outages. Biondi et al. extend the State Task Network (STN), originally proposed by Kondili et al. , toaccount for degrading equipment and diﬀerent operating modes. They assume thateach unit, after maintenance, has a given maximum residual lifetime and that eachtask performed on a unit in a certain operating mode reduces this residual lifetimeby a given amount. Noticeably, none of these authors make use of the wealth ofknowledge regarding degradation modeling and inference from data available fromthe CBM literature. Furthermore, degradation is assumed to be deterministic whichmay not be the case in practice.Recent works have started to incorporate degradation models from CBM intoprocess level Mixed-Integer Linear Programming (MILP) problems . Yildirimet al. , formulate an optimization model for generator maintenance and productionscheduling. The cost of maintenance is calculated beforehand using a data-drivendegradation model: c t = c prev (cid:16) − p ft (cid:17) + c corr p ft (cid:82) t p fτ dτ , where c t is the predicted cost of performing maintenance at time t given the costof preventive ( c prev ) and corrective ( c corr ) maintenance. The failure probability p ft = P ( unit fails before t ) is calculated from the degradation model. The authorslater applied the same approach to maintenance and operation of wind farms andextended it to include opportunistic maintenance . While this approach starts toincorporate information from more sophisticated degradation models into processlevel optimization problems, degradation is still considered to be deterministic in theMILP optimization. Başçiftci et al. extend this to consider sudden failures by usingstochastic programming and generating scenarios from the underlying degradationmodel. To the best of our knowledge Başçiftci et al.’s approach is the only workcombining stochastic optimization with information from degradation models.Unfortunately, the aforementioned approach cannot capture eﬀects of the selectedoperating strategy on degradation. Since maintenance cost is calculated based onthe degradation model before the optimization problem is solved, the degradation isassumed to be independent of the operating strategy. In practice this will often notbe the case.This paper argues for a tighter integration between the sophisticated degrada-tion models used in CBM and process level maintenance scheduling and processoptimization. To this end we make multiple contributions:3 We show how Lévy type models, a class of stochastic processes commonly usedin Degradation Modeling, can be incorporated into an integrated maintenanceand process MILP model. Lévy type models include the Wiener and Gammaprocesses – two very popular models in CBM. By making the Lévy modelsparameters depend on a set of operating modes, equipment degradation toodepends on the operating strategy.• We show how uncertainty and randomness in the equipment’s degradationcharacteristics can be incorporated using adjustable robust optimization. Weuse results from the CBM literature to eﬃciently determine the robustness ofthe obtained solution.• We prove that, in certain cases, feasible solutions to the adjustable robustoptimization problem can be found by solving a deterministic approximationwith worst case values for the uncertain parameters.• Realizing that process planning and scheduling can be computationally expen-sive yet highly repetitive, we develop a computationally eﬃcient, data-drivenway of a-priori estimating equipment failure probabilities. To this end, we gen-erate data using a short-term scheduling model repeatedly. Using this data,we propose two methods based on Logistic Regression capable of cheaply gen-erating a large number of long-term schedules which can be used to estimatefailure probabilities.• We propose Bayesian optimization for eﬃciently optimizing the uncertainty set.The uncertainty set size depends on a small number of parameters, but solvingthe robust MILP integrated maintenance and process optimization problemcan be computationally expensive. Bayesian optimization is ideal for this kindof low dimensional problem with expensive function evaluations.As a challenging case study, we apply the proposed method to an extension of thestate-task-network (STN) . This model combines both planning and scheduling ofproduction and maintenance with operating mode dependent equipment degradation.We test our method on a number of STN instances .4 Combining degradation modeling and robust op-timization

Following Vassiliadis and Pistikopoulos , we assume an integrated production andmaintenance scheduling problem of the formmin x , m cost ( x , m ) (1)s.t. process model ( x , m ) (1a)maintenance model ( x , m ) , (1b)where x are the process variables (continuous and discrete) and m are the mainte-nance related variables. The process model includes, e.g., material balances, energybalances, unit constraints, and the maintenance model includes, e.g., maintenancecrew constraints or constraints regarding diﬀerent types of maintenance. Note thatcost minimization could easily be replaced by proﬁt maximization.A health model added to Problem 1 accounts for equipment degradation:min x , m , h cost ( x , m , h ) (2)s.t. process model ( x , m , h ) (2a)maintenance model ( x , m , h ) (2b)health model ( x , m , h ) , (2c)where h are health related variables and the health model includes all equipmenthealth or degradation related constraints. Our ﬁrst contribution is developing ageneric health model based on the assumption that the equipments’ state of healthcan be described by Lévy type processes, a class of stochastic processes commonlyused for modeling degradation in CBM. The premise in Degradation Modeling is that a degradation signal s meas ( t ) describesthe state of degradation of a unit over time. Signal s meas ( t ) can either be mea-sured directly or obtained indirectly from measurements. Two common assumptionsadopted in this paper are that: (SMAX) The unit fails and requires corrective maintenance when s meas ( t ) crossesa threshold s max , 5 AGAN) s meas ( t ) is reset back to initial value s after preventive maintenance.The unit is as-good-as-new (AGAN) .The degradation signal s meas ( t ) is often modeled by stochastic processes . Oneclass of stochastic processes are Lévy type processes: Deﬁnition 1. Lévy type process . A stochastic process S ( t ) = { S t : t ∈ T } ,where S t is a random variable, with1. independent increments: S t − S t , . . . , S t n − S t n − are independent for any < t < t < . . . < t n < ∞ ,2. stationary increments: S t − S s and S t − s − S have the same distribution forany s < t ,3. continuity in probability: lim h → P ( | S t + h − S t | > (cid:15) ) = 0 for any (cid:15) > , t ≥ .Lévy type processes include both the Wiener and Gamma processes, which arethe most commonly used stochastic processes in the Degradation Modeling litera-ture . Due to their independence and stationarity, Lévy type process incrementscan be described by S t − S t − ∆ t = D (∆ t ) , D (∆ t ) ∼ D ( θ , ∆ t ) , ∀ t, (3)where D (∆ t ) is a random variable that follows a given distribution D ( θ , ∆ t ) withparameters θ . A diﬃculty, however, arises when D is also dependent on some of theoperational variables x : D (∆ t ) ∼ D ( θ ( x ) , ∆ t ) . (4)This dependence has been addressed by assuming that the operational variables x are piecewise constant, i.e., the process can only operate in a number of discreteoperating modes k ∈ K . Under this assumption Eqns. 3 and 4 simplify to S t − S t − ∆ t = (cid:88) k ∈K x k,t · D k (∆ t ) , D k (∆ t ) ∼ D ( θ k , ∆ t ) , (5)where x k,t is if the process operates in mode k at time t and otherwise. Notethat this approach is very similar to regime-switching Lévy models used extensivelyin ﬁnance . Biondi et al. use a similar approach in their STN extension.Much of the Degradation Modeling literature focuses on estimating θ and using,e.g., Bayesian approaches to update it regularly based on new available data .A major advantage of Bayesian approaches is that θ can be estimated based on apopulation of units ﬁrst and then individually adjusted to a particular unit .6 .2 Constructing a health model We summarize the assumptions on which health model 2c hereafter is based: Foreach process unit j , a degradation signal s measj ( t ) can be obtained from measurementswhich is modeled well by a Lévy process S j ( t ) , i.e., increments follow Eqn. 5. Theunit fails when S j ( t ) reaches a maximum threshold s maxj (SMAX) ( T fail = inf { t ∈ T | S j,t > s maxj } ) and S j ( t ) resets to an initial value s j after maintenance (AGAN).Based on these assumptions and assuming a discrete time formulation, the followinghealth model replaces Eqn. 2c: S j,t ≤ s maxj ∀ t, j ∈ JS j,t = (cid:40) S j,t − + (cid:80) k ∈K x j,k,t · D j,k , if m j,t = 0 s j , otherwise ∀ t, j ∈ J, (6)where J is the set of process units and m j,t is if a maintenance action starts onunit j at time t and otherwise. To address the random nature of degradation,the random variables D j,k and S j,t can be approximated by an uncertain parameter ˜ d j,k and a deterministic variable s j,t respectively. Assuming that ˜ d j,k is boundedby a compact uncertainty set U , Problem 2 can be robustiﬁed by requiring that allconstraints hold for any ˜ d j,k ∈ U : s j,t ≤ s maxj ∀ t, j ∈ Js j,t = (cid:40) s j,t − + (cid:80) k ∈K x j,k,t · ˜ d j,k , if m j,t = 0 s j otherwise ∀ ˜ d j,k ∈ U , t, j ∈ J. (7)This model explicitly considers preventive maintenance. Corrective maintenancebecomes necessary only when realizations of ˜ d j,k lie outside the uncertainty set U and constraint s j,t ≤ s maxj is violated.Notice that it is generally not possible to choose s j,t such that the equality con-straint in Problem 7 holds for all values of ¯ d j,k in U , except for the trivial solution x j,k,t = 0 , ∀ j, k, t . This is because the degradation signal s j,t is an analytical variable,not a decision variable. Interpreting the degradation signal instead as a second stagevariable s j,t (cid:16) ˜ d j,k (cid:17) turns Problem 7 into an adjustable robust optimization problemand a linear decision rule can be used to expresses s j,t as a function of ˜ d j,k : s j,t (cid:16) ˜ d j,k (cid:17) = [ s j,t ] + (cid:88) k [ s j,t ] k ˜ d j,k , (8)where [ s j,t ] and [ s j,t ] k are coeﬃcients which become variables in the adjustable robustproblem. Technically, ˜ d j,k should also be indexed by t as every time period constitutes7n independent realization of D j,k . Time-indexed uncertain parameters have beenpreviously explored , but they can lead to a large increase in variables, especiallyfor discrete time formulations. We therefore make the simplifying assumption thatuncertainty is only revealed once after all variables except s j,t have been selected.The health model 7 can be reformulated to remove the conditonal equality con-straint, resulting in the ﬁnal formulation: min x , m cost ( x = [ x j,k,t , . . . ] (cid:62) , m = [ m j,t , . . . ] (cid:62) , h = [ s j,t (cid:0) ¯ d j,k (cid:1) ] (cid:62) ) s.t process model ( x , m , h ) maintenance model ( x , m , h ) m j,t s j ≤ s j,t (cid:0) ¯ d j,k (cid:1) ≤ s maxj + m j,t · ( s j − s j,max ) ∀ t, j ∈ J, ˜ d ∈ U s j,t (cid:0) ¯ d j,k (cid:1) ≥ s j,t − ∆ t + (cid:88) k x j,k,t ˜ d j,k + m j,t · ( s j − s maxj ) ∀ t, j ∈ J, ˜ d ∈ U s j,t (cid:0) ¯ d j,k (cid:1) ≤ s j,t − ∆ t + (cid:88) k x j,k,t ˜ d j,k ∀ t, j ∈ J, ˜ d ∈ U . (9)By replacing s j,t (cid:16) ˜ d j,k (cid:17) with Eqn. 8 in each constraint and using standard robustoptimization reformulation techniques, the health model can be transformed into adeterministic robust counterpart (see Appendix A).Consider a deterministic version of Problem 9 in which cost, process model, andmaintenance model are not functions of s j,t (cid:16) ˜ d j,k (cid:17) and ˜ d j,k has been replaced by d maxj,k = max U ˜ d j,k : min x , m cost ( x , m ) s.t process model ( x , m ) maintenance model ( x , m ) m j,t s j ≤ s j,t , ∀ t, j ∈ Js j,t ≤ s maxj + m j,t ( s j − s maxj ) , ∀ t, j ∈ Js j,t ≥ s j,t − + (cid:88) k x j,k,t d maxj,k + m t j, t ( s j − s maxj ) , ∀ t, j ∈ Js j,t ≤ s j,t − + (cid:88) k x j,k,t d maxj,k , ∀ t, j ∈ J, (10)where s j,t is not a second stage variable anymore since there are no more semi-inﬁniteconstraints. Under certain circumstances, feasible solutions to robust Problem 9 canbe found by solving deterministic Problem 10:8 heorem 1. Given that cost, process model, and maintenance model are not func-tions of ˜ d j,k and that s j ≤ s initj = s j,t = t ≤ s maxj and ˜ d j,k ≥ , ∀ ˜ d j,k ∈ U , thena feasible solution ( x = [ x k,t , . . . ] , m = [ m t , . . . ] , h = [ s t ]) to Problem 10 forms afeasible solution ( x = [ x k,t , . . . ] , m = [ m t , . . . ] , h = [[ s t ] , [ s t ] k ]) to Problem 9 with [ s t ] = (cid:40) s init t < t m, s t ≥ t m, (11a) [ s t ] k = t (cid:88) t (cid:48) = t m,t x k,t , (11b) where s init = s ( t = 0) , t m, is the ﬁrst point in time at which maintenance is per-formed, and t m,t is the most recent point in time at which maintenance was performed.Proof. See Appendix BTheorem 1 only guarantees solution feasibility, not optimality. How well Prob-lem 10 approximates Problem 9 also depends on the selected uncertainty set.

A major decision in robust optimization is the uncertainty set choice. This paperuses a simple box uncertainty set U = { ˜ d j,k | ¯ d j,k (1 − (cid:15) j,k ) ≤ ˜ d j,k ≤ ¯ d j,k (1 + (cid:15) j,k ) } , where ¯ d j,k is the nominal value of ˜ d j,k and (cid:15) j,k is a parameter determining the uncer-tainty set size. Note that this choice assumes that the random increments D j,k areindependent, as a box uncertainty set cannot capture correlation between uncertainparameters. This assumption could be relaxed with a more complicated uncertaintyset, e.g., a polyhedral set. Since Degradation Modeling assumes that the distributionof D j,k is known, (cid:15) j,k can be determined using the inverse cumulative distributionfunction F − : (cid:15) j,k = 1 − F − ( α ) / ¯ d j,k , where α = P ( D j,k ≤ ¯ d j,k (1 − (cid:15) j,k )) . If the distribution of D j,k is unknown, data-drivennon-parametric methods such as Kernel Density Estimation can be used to estimateit .By using F − , the uncertainty set size depends on a single parameter α ∈ [0 , . .For α = 0 , the uncertainty set includes all possible realizations of D j,k and for α = 0 . ¯ d j,k . While box uncertaintysets are often more conservative than most of the many other available uncertaintyset types , the solution robustness/conservatism in this formulation can be variedby adjusting α . Assume x kj = [ k , k , . . . , k T ] , where k t = k ⇐⇒ x j,k,t = 1 , is the sequence ofoperating modes given by a solution to Problem 9. Its robustness can be measuredby the probability p j that unit j does not fail in the time horizon Tp j = P ( S j,t ≤ S j,max , ∀ t < T | x kj ) , or equivalently its probability of failure p fj = 1 − p j = P ( ∃ t such that S j,t > S j,max | x kj ) . Assuming the parameters θ j,k of the distributions D j,k are estimated from data, p fj can be calculated through Monte-Carlo simulation by randomly generating N realiza-tions s nj = [ s nj, , s nj, ∆ t , . . . , s nj,T ] of S j ( t, x kj ) with D j,k ( θ j,k , ∆ t ) distributed incrementsand checking how many violate s nj,t ≤ s maxj : p fj = (cid:80) Nn =1 ( ∃ t < T such that s nj,t > s maxj ) N , (12)where is the indicator function. This is illustrated in Fig. 1 for N = 3 .In the special case where D j,k is normal distributed D j,k ∼ N ( µ j,k ∆ t, σ j,k ∆ t ) , i.e.,the Wiener process model is used, p fj can be eﬃciently calculated using analyticalresults for the crossing probability of a Brownian motion on a piecewise linear bound-ary . Instead of sampling from D j,k at regular time intervals ∆ t , this approachonly randomly samples at operating mode transitions ( k t (cid:54) = k t +1 ). It requires farless Monte-Carlo samples and is therefore faster than the general method outlinedabove. A detailed description of this approach is given in the Appendix D. Evaluating the probability of failure p fj , e.g.,using Eqn. 12, requires the exact se-quence of operating modes x kj and maintenance actions to be known over the evalu-ation horizon T . Since maintenance tends to be infrequent, T has to be suﬃciently10 D j, : D j, : k ts j ts maxj ∆ t d j, sample ∆ td j, p fj = Figure 1: Example for calculating the failure probability p fj using Eqn. 12 and Monte-Carlo simulation. The operating mode schedule is x k = [1 , , Maint. ] . . . . . n j, P ( N j , = n j , ) . . . . n j, P ( N j , = n j , ) Estimate from data: n j, ,l = 2 Sample n j, ,l = 3 Sample 2 1 1 2 1Random order: x kj,l = [1 , , , , M, Insert maintenance:MCalculate p fj ( x kj,l ) Repeat N times, ¯ p fj = max p fj ( x kj,l ) : Figure 2: Frequency approach: Estimating p fj from historical data using Algorithm 1.11ong to obtain meaningful failure probabilities. Solving Problem 9 over a long timehorizon may be computationally challenging. Instead, it may be possible to use ex-isting data of past schedules to estimate p fj . If no historical data is available, it canbe generated by solving Problem 9 over a shorter horizon. This section outlines twomethods by which an upper estimate of p fj can be obtained from data. Assuming time discretization, a conceptually easy way to obtain an upper bound ¯ p fj on p fj is to generate the set X of all possible permutations of operating modesequences x kj = [ k j, , k j, , . . . , k j,T ] and ﬁnd the maximum probability of failure ¯ p fj = max x ∈X p fj ( x kj ) . For any realistic problem X will be very large, but there are two ways to reduceits size: First, the operating mode sequences can be generated without consideringmaintenance. Maintenance actions can then be inserted consecutively at the latestpoint in time t m,l which satisﬁes max ˜ d j,k ∈U t m,l (cid:88) t (cid:48) = t m,l − (cid:88) k x j,k,t (cid:48) ˜ d j,k < s maxj − s j , (13)where t m,l − is the previous maintenance activity and t m, = 0 . ¯ p fj remains anupper bound, because maintenance at a later point in time always causes a largerprobability of failure. Secondly, it may be possible to estimate the frequency ofoccurrence n j,k = (cid:80) t x j,k,t of each operating mode k from data. If these frequenciesare modeled as random variables N j,k , a smaller X can be obtained by only generatingsequences which obey frequencies drawn from the distributions of N j,k . This suggeststhe following algorithm for obtaining an estimate of ¯ p fj which is also visualized inFig. 2:If N is large enough and the estimated distribution of N j,k is accurate, ¯ p fj shouldbe a good upper bound on p fj . The second approach for estimating ¯ p fj is inspired by the use of Markov chains inregime-switching models in ﬁnance and to some extent also in the CBM literaturefor modeling diﬀerent enviromental or operating regimes of a process . The12 lgorithm 1 Frequency approach [illustrated in Fig. 2] procedure estimate ¯ p fj η n j,k = P ( N j,k = n j,k ) ← estimate from historical data ∀ n k l ← while l ≤ N do n j,k,l ← draw random sample from P ( N j,k = n j,k ) for each k x kj,l ← arrange n j,l = (cid:80) k n j,k,l op. modes in random order [ k , k , . . . , k n j,l ] x kj,l ← insert maintenance at last possible points in time (cid:46) Eqn. 13 p fj,l ← p fj ( x kj,l ) (cid:46) Eqn. 12 end while ¯ p fj ← max l ≤ N p fj,l end procedure key idea is to treat the occurrence of operating modes k over time as a Markovchain. Modeling the sequence of operating modes x kj on a unit by a memorylessMarkov chain X kj ( t ) = { X kj,t : t ≤ T } , the probability π k,k ∗ of transitioning from oneoperating mode k to another k ∗ is given by π k,k ∗ = P ( X kj,t = k ∗ | X kj,t − = k ) . The transition probabilities π k,k ∗ can be estimated from data.From this Markov chain random sequences of operating modes x kj,l can be gener-ated. Maintenance can again be inserted at the latest possible point in time accordingto Eqn. 13. x kj,l may not be a feasible solution to Problem 9, but it can be used toestimate ¯ p fj . The approach is summarized in Algorithm 2: The optimal sequence of operating modes x k, ∗ j depends not only on the structure ofthe process and the size of the uncertainty set U ( α ) , but also on parameters ψ ( t ) such as product demands or environmental variables. The distributions of N k and π k,k ∗ are therefore not necessarily stationary: η n j,k ( ψ ( t )) = P ( N j,k = n j,k | ψ ) (14a) π k,k ∗ ( ψ ( t )) = P ( X kj,t = k ∗ | X kj,t − = k, ψ ) , (14b)where η n j,k ( ψ ) is the probability that operating mode k occurs n j,k times in timeperiod ∆ t . 13 lgorithm 2 Markov chain approach procedure estimate ¯ p fj π k,k ∗ = P ( X kj,t = k ∗ | X kj,t − = k ) ← estimate from historical data ∀ ( k, k ∗ ) l ← while l ≤ N do x kj,l ← draw random operating mode sequence from Markov chain π k,k ∗ x kj,l ← insert maintenance at last possible points in time (cid:46) Eqn. 13 p fj,l ← p fj ( x kj,l ) (cid:46) Eqn. 12 end while ¯ p fj ← max l ≤ N p fj,l end procedure Demand δ ,t D e m a nd δ , t Frequency n j,k Figure 3: Frequency n j,k of operating mode k occuring on unit j for a schedulingproblem with two product demands ψ = [ δ ,t , δ ,t ] (cid:62) . Points are training data gen-erated by solving the scheduling model and shaded areas are predictions by logisticregression. 14ovariate dependency of Markov chain transition probabilities has previouslybeen modeled by using logistic regression . We model both η n j,k and π k,k ∗ usingmultinomial logistic regression in order to capture the inﬂuence of product demands: η n j,k ( ψ ( t )) = exp( β (cid:62) n j,k ψ ) (cid:80) n (cid:48) k exp( β (cid:62) n (cid:48) j,k ψ ) (15a) π k,k ∗ ( ψ ( t )) = exp( β (cid:62) k,k ∗ ψ ) (cid:80) k + ∈ K exp( β (cid:62) k,k + ψ ) . (15b)We use Scikit-learn for estimating parameters β based on data. Fig. 3 showsan example for a process with two product demands ψ = [ δ ,t , δ ,t ] (cid:62) . The shadedareas are the frequencies n j,k predicted by logistic regression for a particular k and j (the n j,k with the largest η n j,k ( ψ ) ) while the points are training data. We uselogistic regression in this work because of its simplicity and interpretability, but itcould be replaced by any classiﬁcation method capable of probability estimation,e.g., Artiﬁcial Neural Networks, Support Vector Machines, k-Nearest Neighbours,Decision Trees, etc . An important, non-trivial decision when using robust optimization is the size of theuncertainty set — or in this work the choice of parameter α . It governs a trade-oﬀbetween the robustness of the solution and its cost. A common approach is to usea-priori guarantees to determine an uncertainty set size that is guaranteed to have aprobability of constraint violation below a predeﬁned level . A-priori guarantees are,however, not guaranteed to be tight and uncertainty sets based on them can be overlyconservative. As demonstrated by Li and Li , determining the optimal uncertaintyset size can instead be seen as its own optimization problem. They minimize theuncertainty set size with the constraint that the solution remains feasible with apre-deﬁned probability. We propose a diﬀerent formulation that does not require thedecision maker to choose a probability of constraint satisfaction but is based purelyon cost instead: min α c ∗ ( α ) + (cid:88) j p fj ( α ) · c fj , (16)where c ∗ ( α ) is the minimal overall cost of the process as determined by solvingProblem 9 for a given value of α , p fj is the corresponding probability of failure15valuated using Eqn. 12, and c fj is the cost incurred in case of an unplanned failure ofunit j , i.e., the cost of corrective maintenance. Eﬀectively, Problem 16 minimizes thetrade-oﬀ between preventive and corrective maintenance. Note that this formulationassumes that each unit fails no more than once in the evaluated horizon T . This isreasonable under the assumption that the cost of failure c fj is high and therefore P fj tends to be low.Problem 16 is a one-dimensional optimization problem, but determining c ∗ ( α ) and p fj ( α ) can be computationally expensive because it requires solving a potentiallylarge MILP problem and Monte-Carlo simulation. It can therefore be viewed as ablack box optimization problem with expensive function evaluations. We proposeBayesian optimization, which is known to work well on expensive low dimensionalobjective functions, as an eﬀective solution strategy . Bayesian optimization hasthe further advantage that it can handle noise well. Both c ∗ and p fj can be noisybecause it may not be possible to solve Problem 9 to optimality in a reasonable timeframe. Further noise is introduced by the Monte-Carlo simulation used to evaluate p fj . Feed A Heating Hot A Reac. 140% Int. AB60%Prod. 140%Int. BC 60%Reac. 2Feed B50% Feed C50% Reac. 320%80% Impure E Separation10% Prod. 290%

Figure 4: STN instance proposed by Kondili et al. . Tasks are performed on fourunits: Heater, Reactor 1, Reactor 2, and Still.16he model by Biondi et al. , an extension of the State-Task-Network (STN) ,forms the basis of our case study. The classic STN is a scheduling problem in whicha set of tasks I has to be assigned to a set of units J . Biondi et al. extend theSTN by allowing each task i to be performed in a number of diﬀerent operatingmodes k ∈ K i . They add constraints reducing the residual lifetime r j,t of eachunit j every time a task is performed, and restore r j,t by performing maintenance.Because the scheduling problem can only be solved for a short time horizon T S butmaintenance occurs infrequently, they add a planning horizon T P to the problem.For the planning horizon, instead of an exact schedule, only the number of times n i,j,k,t a task i is performed on unit j in operating mode k in each planning period t is calculated. Eqns. 17 to 20d give the modiﬁed version used as a case study in thiswork:Objective function:cost = (cid:88) j ∈ J c maintj (cid:32) s finj /s maxj + (cid:88) t ∈ T m j,t (cid:33) + c storages  q fins + (cid:88) t ∈ T p q s,t  + U (cid:32)(cid:88) s ∈ S φ ds + (cid:88) t ∈ T S φ qs,t (cid:33) (17)Constraints scheduling horizon: (cid:88) k ∈ K j (cid:88) i ∈ I j t (cid:88) t (cid:48) = t − p i,j,k +∆ t S w i,j,k,t (cid:48) + t (cid:88) t (cid:48) = t − τ j +∆ t s m j,t (cid:48) ≤ ∀ J, t ∈ T S (18a) v mini,j w i,j,k,t ≤ b i,j,k,t ≤ v maxi,j w i,j,k,t ∀ J, i ∈ I j , k ∈ K j , t ∈ T S (18b) q s,t = q s,t − + (cid:88) i ∈ ¯ I s ¯ ρ i,s (cid:88) j ∈ J i (cid:88) k ∈ K j b i,j,k,t − p i,j,k − (cid:88) i ∈ I s ρ i,s (cid:88) j ∈ J i (cid:88) k ∈ K j b i,j,k,t ∀ s, t ∈ T S (18c) ≤ q s,t − φ qs,t ≤ c s ∀ s, t ∈ T S (18d) m j,t s j ≤ s j,t ≤ s maxj + m j,t · ( s j − s maxj ) ∀ t, j ∈ J, D ∈ U (18e) s j,t ≥ s j,t − ∆ t S + (cid:88) i (cid:88) k w i,j,k,t ˜ d j,k + m j,t · ( s j − s maxj ) ∀ t, j ∈ J, D ∈ U (18f)17 j,t ≤ s j,t − ∆ t S + (cid:88) i (cid:88) k w i,j,k,t ˜ d j,k ∀ t, j ∈ J, D ∈ U , (18g)Constraints planning horizon: (cid:88) i ∈ I j (cid:88) k ∈ K j p i,j,k n i,j,k,t + τ j m j,t ≤ ∆ t P ∀ J, t ∈ T P \{ ¯ t P } (19a) v mini,j (cid:88) k ∈ K j n i,j,k,t ≤ a i,j,t ≤ v maxi,j (cid:88) k ∈ K j n i,j,k,t ∀ J, i ∈ I j , k ∈ K j , t ∈ T P (19b) q s,t = q s,t − + (cid:88) i ∈ ¯ I s ¯ ρ i,s (cid:88) j ∈ J i a i,j,t − (cid:88) i ∈ I s ρ i,s (cid:88) j ∈ J i a i,j,t − δ s,t ∀ s, t ∈ T P \{ ¯ t P } (19c) ≤ q s,t ≤ c s ∀ s, t ∈ T P (19d) n i,j,k,t ≤ U · ω j,k,t ∀ J, i ∈ I j , k ∈ K j , t ∈ T P (19e) (cid:88) k ∈ K j ω j,k,t = 1 ∀ J, t ∈ T P (19f) s j,t ≤ s maxj ∀ t, j ∈ J (19g) s tj ≥ s j,t − ∆ t P + (cid:88) k n j,k,t ˜ d j,k,t + m j,t · ( s j − s maxj ) ∀ t, j ∈ J (19h) s j,t ≤ s j,t − ∆ t P + (cid:88) k n j,k,t ˜ d j,k,t ∀ t, j ∈ J (19i)Constraints interface between scheduling and planning: (cid:88) k ∈ K j (cid:88) i ∈ I j ¯ t S (cid:88) t (cid:48) =¯ t S +2∆ t S − p i,j,k w i,j,k,t (cid:48) [ p i,j,k − (¯ t S − t (cid:48) + ∆ t S )]+ ¯ t S (cid:88) t (cid:48) =¯ t S +2∆ t S − τ j m j,t (cid:48) [ τ j − (¯ t S − t (cid:48) + ∆ t S )]+ (cid:88) i ∈ I j (cid:88) k ∈ K j p i,j,k n i,j,k, ¯ t P + τ j m j, ¯ t P ≤ ∆ t P , ∀ j ∈ J (20a) q fins = q s, ¯ t S + (cid:88) i ∈ ¯ I s ¯ ρ i,s (cid:88) j ∈ J i (cid:88) k ∈ K j b i,j,k, ¯ t S +1 − p i,j,k − δ s, ¯ t S + φ ds ∀ s (20b)18 ≤ q fins ≤ c s ∀ s (20c) q s, ¯ t P = q fins + (cid:88) i ∈ ¯ I S ¯ ρ i,s (cid:88) j ∈ J i (cid:88) k ∈ K j ¯ t S (cid:88) t (cid:48) =¯ t s +2 − p i,j,k b i,j,k,t (cid:48) + (cid:88) i ∈ ¯ I S ¯ ρ i,s (cid:88) j,J i a i,j, ¯ t P − (cid:88) i,I s ρ i,s (cid:88) j ∈ J i a i,j, ¯ t P − δ s, ¯ t P ∀ s (20d)The decision variables are m j,t , q s,t , w i,j,k,t , n i,j,k,t , b i,j,k,t , a i,j,t , s j,t , φ qs,t , φ ds and ω j,k,t .The product demand δ s,t has to be satisﬁed at the end of each planning periodand at the end of any time interval in the scheduling horizon. In practice, this modelwould be solved regularly in a rolling horizon fashion using recent demand estimatesand degradation signal measurements s j, .In comparison to Biondi et al. , the residual lifetime constraints have been re-placed with the degradation signal based health model developed above (Eqn. 9).For the planning horizon, s j,t cannot be reset exactly to s j , because the exact timeat which maintenance is performed is unknown. Instead, it is merely enforced that s j,t ≤ s maxj .In addition, the objective function is slightly diﬀerent. The term c maintj ( s finj /s maxj ) can be interpreted as a ﬁnal cost of maintenance dependent on the ﬁnal degradationsignal s finj (state of health) of unit j . Similar to Biondi et al.’s penalty terms itavoids unnecessary degradation and ensures maintenance happens towards the endof a units residual lifetime.Since the exact sequence of operating modes and maintenance actions is unknownin the planning horizon T P , the probability of failure p fj can only be evaluated overthe scheduling horizon T S . In order to still evaluate p fj over a longer time period twopossibilities exist: the schedule can be extended in length by solving the model re-peatedly in a rolling horizon fashion or the Markov chain-based estimation approachin Algorithm 2 can be used. We compare both approaches to show that the proposedMarkov chain estimate is indeed accurate. Note that, in order to facilitate a rollinghorizon based solution approach, slack variables have been introduced in Eqns. 18dand 20b. This is necessary because the rolling horizon framework does not guaranteefeasibility in subsequent time periods. Production targets from the planning modelmay, for example, not be achievable in the scheduling model. The slack variables φ ds and φ qs,t are penalized in the objective function.We assume that the frequencies of operating mode occurrence n j,k and the tran-sition probabilities π k,k ∗ depend on the product demands in each planning period19 φ t = [ d s ,t , . . . , d s n ,t ] ). We sample a range of demands using Latin Hypercube Sam-pling and solve just the scheduling horizon to generate data for estimating n j,k ( ψ t ) and π k,k ∗ ( ψ t ) .Notice that the model above fulﬁlls the assumptions in Theorem 1 and solutionscan be obtained by solving the deterministic approximation 10. The framework outlined above was evaluated on ﬁve instances of the STN (see Ta-ble 1 and Appendix C). The model was implemented in Pyomo and solved usingInstance Toy P1 P2 P4 P6 Units 2 4 5 3 6Tasks 3 5 3 4 8Op. modes 2 3 3 2 2Products 2 2 1 2 4Discrete vars 518 2492 1930 1869 1993Continuous vars 1033 3630 2371 2777 4084Constraints 1860 7332 5705 5699 7994Avg. MIP gap [%] 0.0 3.0 5.8 10.9 1.02Table 1: Evaluated STN instances (Details: see Appendix C)CPLEX 12.7.1.0. All source code is publicly available under the MIT Licence .Unless mentioned otherwise, we considered an evaluation horizon of planning pe-riods. The failure probability p fj for each unit j was evaluated for a range of valuesof the uncertainty set parameter α using both the frequency and Markov chain esti-mates as well as rolling horizon. The termination criteria for each CPLEX run werea maximum time limit between − minutes (depending on the size of the instance)and a MIP gap of (except for the toy instance which was solved to optimality).For each instance a low, average, and high demand scenario was considered withthe high scenario being close to maximum process capacity. For Instance P1 boththe robust and deterministic Problems 9 and 10 were solved a number of times toevaluate the quality of the deterministic approximation. For all other instances onlythe deterministic approach was used. All calculations were carried on an i7-6700CPU with × . GHz and 16GB RAM.Fig. 5 shows three maintenance schedules for the original STN instance (P1)by Kondili et al. with Biondi et al.’s demand scenario (average scenario) with20 eterministic, equivalent to α = 0 . HeaterReactor 1Reactor 2Still robust, α = 0 . HeaterReactor 1Reactor 2Still robust, α = 0 . HeaterReactor 1Reactor 2Still

Figure 5: Comparison of maintenance schedules between deterministic solution ( α =0 . ) and two robust solutions with diﬀerent values of α (Instance P1 , averagedemand scenario by Biondi et al. ). The number of required maintenance actionsincreases with increasing uncertainty set size (decreasing α ).diﬀerent values for α . The number of maintenance actions increases with increasinguncertainty set size (decreasing α ). Essentially, hedging against more uncertainty andensuring solution robustness for a larger set of possible realizations requires earliermaintenance. In the average demand scenario, maintenance actions increase by when hedging against some of the uncertainty ( α = 0 . ) and by when hedgingagainst almost all uncertainty ( α = 0 . ). However, this trend also depends onproduct demand: for the low demand scenario, only an increase of is necessaryfor α = 0 . , while an increase of is necessary in the high demand scenario. Ahigher demand increases unit utilization and therefore also the absolute number ofmaintenance actions required in a given time period.Fig. 6 shows the failure probability p fj for Reactor 1 as a function of both totalcost (cost of storage and cost of maintenance) and the uncertainty set parameter α . As expected, p fj increases for smaller uncertainty sets (large α ’s) and a low p fj comes at a signiﬁcant cost – the price of robustness. Notice that, while calculating p fj using Monte-Carlo simulation only introduces modest noise, the calculated costis very noisy due to the non-optimality of the solutions.The results in Fig. 6 were obtained by solving the deterministic Problem 10.While Theorem 1 guarantees that these solutions are also feasible in the robustProblem 9, it does not prove that they are also optimal. Fig. 7 shows both total cost21 α p f j [ % ] Cost

Figure 6: Probability of Reactor 1 failing p fj vs uncertainty set parameter α and cost(Instance P1 , average demand scenario). Each point is a solution to Problem 10obtained by CPLEX. The probability of failure p fj was estimated using the Algorithm2 Markov chain approach. C o s t A v e r ag e M I P ga p [ % ] Figure 7: Robust vs. deterministic approach (Instance P1 ). Each model was solved times with α = 0 . and a s time limit per rolling horizon iteration using out-of-the-box CPLEX. The robust model has 22514 variables and 14664 constraintswhile the deterministic model has 6122 variables and 7332 constraints.22nd average optimality gap for a number of rolling horizon solutions to the deter-ministic and robust version of Instance P1 . The uncertainty set size was α = 0 . and a maximum time limit of s was used for each CPLEX run. Within this timelimit, out-of-the-box CPLEX achieves an average optimality gap of . on therobust problem compared with . for the deterministic approximation with maxi-mum values for ˜ d j,k . Similarly, the deterministic approximation achieves signiﬁcantlylower objective values. While many approaches could improve solution quality of therobust problem, e.g., solver parameter tuning or leveraging Satisﬁability ModuloTheory , Constraint Programming , or Approximation Algorithms , it is likelythat the deterministic approximation will remain favorable as it has signiﬁcantlyfewer variables and constraints ( vs. and vs. respectively). As-suming a box uncertainty set, we view it as a reasonable approximation for instanceswhich cannot be solved to optimality in a reasonable amount of time. In the case ofa more complex uncertainty set, replacing ˜ d j,k with its maximum value may lead toconservative solutions. General uncertainty sets require solving the robust problem.Note that the large range of solution values in Fig. 7 is not only due to thediﬀering MIP gaps, but also the rolling horizon approach which does not guaranteeoptimality.Logistic regressions for the Frequency and Monte-Carlo approaches were trainedbased on scheduling horizon only solutions. For the Frequency approach, thetraining points and their predicted values for Reaction in mode Normal of the toyinstance are shown in Fig. 8. Logistic regression predicts n i,j,k reasonably well butthe rigid, linear classiﬁer cannot capture some of the details. This is, however, nota major problem as the entire predicted probability distribution η n i,j,k of operatingmode occurence frequencies n i,j,k is used in estimating ¯ p fj . Near the predicted bound-aries, η n i,j,k of adjacent frequencies will be non-zero and they will be sampled in asigniﬁcant number of operating mode sequences in Algorithm 1.Fig. 9 shows the probability of failure p fj for each unit in the toy instance asa function of the uncertainty set parameter α for three diﬀerent demand scenarios(average, high, and low). Since the rolling horizon framework does not guaranteeoptimality, solving the problem repeatedly for the same value of α can lead to diﬀerentsolutions and failure probabilities. The problem was therefore solved times foreach value of α . It can be seen that the probability of failure generally increaseswith demand. The ﬁgure furthermore shows the two bounds obtained using theFrequency and Markov chain based approaches, i.e. Algorithm 1 versus 2. For thereactor both approaches provide good upper bounds. For the heater, the frequencybased approach performs very well, while the Markov chain approach underestimates p fj for the high demand scenario and overestimates it for the average and low demand23 Demand δ ,t D e m a nd δ , t Frequency n i,j,k Figure 8: Toy instance: Frequency n i,j,k of Reaction 1 occuring in normal mode onunit Reactor within one scheduling horizon. Points are training data generated bysolving the scheduling model repeatedly and shaded areas are predictions by logisticregression. Heater Reactor0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.50255075100 α p f j [ % ] Scenario avg high low

Bound freq MC

Figure 9: Toy instance: Frequency vs. Markov chain approach. Points are rollinghorizon solutions. Colored lines are bounds from the Frequency and Markov chainapproach. The dotted black lines show a-priori bound B4 by Li et al. .24cenarios. Finally notice that the apriori bound given by the dotted lines greatlyoverestimates p fj . Fig. 10 shows similar trends for Instance P1 (Kondili). Bothapproaches provide reasonable bounds for all units and scenarios except the averagedemand scenario on the Heater, for which the frequency approach underestimates p fj . Notice that p fj is nearly zero for both the Heater and Reactor 2 at low demandirrespective of α . This is because for this scenario no maintenance occurs on eitherunit and s j,t does not get close to s maxj . Reactor 2 StillHeater Reactor 10.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.502550751000255075100 α p f j [ % ] Scenario avg high low

Bound freq MC

Figure 10: Instance P1 : Frequency vs. Markov chain approach. Points are rollinghorizon solutions. Colored lines are bounds from the Frequency and Markov chainapproach. The dotted black lines show a-priori bound B4 by Li et al. .The performance of the Frequency and Markov chain based probability estimates25as assessed using three metrics:rms all = 1 N · | A | (cid:88) n ∈{ ..N } ,α ∈ A (cid:18)(cid:104) p fj (cid:105) n,α − ¯ p fj (cid:19) , (21a) p out = 1 N · | A | (cid:88) n ∈{ ..N } ,α ∈ A (cid:18)(cid:104) p fj (cid:105) n,α > ¯ p fj (cid:19) , and (21b)rms out = 1 p out · N · | A | (cid:88) n ∈{ ..N } ,α ∈ A (cid:18)(cid:104) p fj (cid:105) n,α > ¯ p fj (cid:19) (cid:18)(cid:104) p fj (cid:105) n,α − ¯ p fj (cid:19) , (21c)where rms all is the root-mean-squared deviation between the estimate and all rollinghorizon solutions, p out is the percentage of rolling horizon solutions with a larger p fj than the estimated bound, and rms out is the root-mean-squared deviation of allunderestimated points. While rms all evaluates the estimate ¯ p fj ’s quality as a predictorof p fj , p out and rms out assess its quality as an upper bound.instance bound rms all p out rms out toy freq 8.00 17.54 0.90toy mc 10.41 9.62 2.86P1 freq 12.61 18.08 5.80P1 mc 17.25 10.13 1.81P2 freq 7.40 48.19 2.24P2 mc 13.68 40.56 1.10P4 freq 10.09 13.77 4.10P4 mc 11.40 11.91 2.67P6 freq 16.35 29.40 4.48P6 mc 20.16 21.27 3.23all freq 10.89 25.40 3.50all mc 14.58 18.70 2.34Table 2: Average performance metrics for probability estimates - all instancesTable 2 shows values for all three instances averaged over demand scenarios andunits for all tested STN instances. It can be seen that the frequency approach isgenerally a better estimator for p fj than the Markov chain approach (smaller valuesof rms all ) but also has a larger rate of misclassiﬁcation p out . While rms all can belarge due to noise in the rolling horizon solutions and p out values of up to showthat ¯ p fj is not a perfect upper bound, rms out is generally small with average values26f . and . for the Frequency and Markov chain approach respectively. Thismeans that, when ¯ p fj underestimates p fj , it does not do so by much. Considering thenoise introduced by non-optimal solutions and the rolling horizon framework, theerror introduced by estimating p fj through ¯ p fj is small. Because it is a slightly betterupper bound, the Markov chain approach was used for all subsequent experimentsunless mentioned otherwise. freq MC0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.50255075100 α p f j Scenario avg high low N

100 1000

Figure 11: Eﬀect of number of Monte-Carlo samples N on failure probability p fj ofReactor 1 (Instance P1 ).Both probability estimation approaches are dependent on the number N of oper-ating mode sequences generated. Fig. 11 shows that increasing N from to for Reactor 1 in Instance P1 only has a small eﬀect on p fj , especially for the Markovchain based approach. N = 100 is therefore deemed suﬃcient.Figs. 12 and 13 compare Bayesian optimization (BO) with random search for In-stance P1 . Overall cost (Eqn. 16) was evaluated over a horizon of 24 weeks and bothBayesian optimization and random search were repeated ﬁve times. For the Bayesianoptimization, four points were sampled evenly from the interval α ∈ [0 . , . ini-tially. Fig. 12 shows the obtained objective values as a function of α . There isclearly a trade-oﬀ between high cost of preventive maintenance in conservative so-lutions (small α ) and high cost of corrective maintenance for less conservative butalso less robust solutions (large α ). Bayesian optimization very eﬃciently samples27 α C o s t BO rand.

Run

Figure 12: Bayesian optimization vs. random search. Five runs of BO and randomsearch were conducted. BO eﬃciently explores values near the optimal α . Iteration C o s t BO rand.

Figure 13: Bayesian optimization vs. random search. Points are individual valuesobtained at a given iteration. Lines represent the best previously achieved solutionvalue averaged over ﬁve runs. BO consistently ﬁnds better solutions.28rom the area around the optimal α ≈ . while random search naturally samplesfrom the entire interval. Fig 13 shows the lowest objective value previously obtainedas a function of the number of samples (averaged over ﬁve runs). Bayesian opti-mization consistently ﬁnds lower cost solutions than random search and achieves agood compromise between preventive and corrective maintenance within about iterations. Bayesian Optimization: min α c ∗ ( α ) + P j c fj p fj ( α ) (Eqn. 16)Robust MILP (P. 9)Process & maintenance modelIntegrated MILP (P. 10)Health model (Eqn. 7) Rolling horizonFreq./Markov chain estimate (Alg. 1/2)orDegradation Modeling (Eqn. 5) D j,k Logistic regression (Eqn. 15)Historical data p fj c ∗ α x kj T S , T P η n j,k , π k,k ∗ Figure 14: Flowchart of the proposed method.This work integrates equipment degradation eﬀects in process level optimizationproblems. The demonstrated methodology is summarized in Fig. 14. We combinecommonly used methods from the Degradation Modeling literature, which allowunit health characteristics to be estimated and updated from data, with RobustOptimization. This is highly relevant since, realistically, almost all equipment inchemical and manufacturing processes will be subject to performance degradationand failures.Solving realistic integrated maintenance and production scheduling and planningproblems is a hard task by itself, because models tend to be large and computation-ally expensive. Combining such models with robust optimization increases the needfor solving problems eﬃciently. Furthermore, the scheduling task is highly repetitiveand should become easier as historical data is collected. To reduce computational29xpense, we show conditions where robust optimization problems can be solved bysolving a deterministic approximation and develop data-based methods for estimat-ing failure probabilities.In the context of computationally expensive models, we show that Bayesian Op-timization can be used eﬀectively to optimize the uncertainty set in Robust opti-mization and balance the trade-oﬀ between preventive and corrective maintenance.

This work was funded by the Engineering & Physical Sciences Research Council(EPSRC) Center for Doctoral Training in High Performance Embedded and Dis-tributed Systems (EP/L016796/1), an EPSRC/Schlumberger CASE studentship toJ.W. (EP/R511961/1, voucher 17000145), and an EPSRC Research Fellowship toR.M. (EP/P016871/1).

A Formulating the robust counterpart

The reformulation of the semi-inﬁnite constraints m j,t s j ≤ s j,t ∀ t, j ∈ J, ˜ d j,k ∈ D (22a) s j,t ≤ s maxj + m j,t · ( s j − s maxj ) ∀ t, j ∈ J, ˜ d j,k ∈ U (22b) s j,t ≥ s j,t − ∆ t + (cid:88) k x j,k,t ˜ d j,k + m j,t · ( s j − s maxj ) ∀ t, j ∈ J, ˜ d j,k ∈ U (22c) s j,t ≤ s j,t − ∆ t + (cid:88) k x j,k,t ˜ d j,k ∀ t, j ∈ J, ˜ d j,k ∈ U (22d)into a deterministic robust counterpart in this work is based on the approach byLappas and Gounaris . s j,t is replaced by the aﬃne decision rule s j,t = [ s j,t ] + (cid:88) k [ s j,t ] k ˜ d j,k in all constraints, where [ s j,t ] and [ s j,t ] k are coeﬃcients which become new variablesin the reformulated constraints. The ﬁrst constraint (Eqn. 22a) can be reformulate30n the following way: m j,t s j ≤ [ s j,t ] + (cid:88) k [ s j,t ] k ˜ d j,k ⇒ − (cid:88) k [ s j,t ] k ˜ d j,k ≤ [ s j,t ] − m j,t s j ⇒ Θ ∗ ≤ [ s j,t ] − m j,t s j , where Θ ∗ = max ˜ d j,k − (cid:88) k [ s j,t ] k ˜ d j,k s.t − ˜ d j,k ≤ − ¯ d j,k (1 − (cid:15) ) ∀ k ˜ d j,k ≤ ¯ d j,k (1 + (cid:15) ) ∀ k. The dual of this is Θ ∗ = min u ,j,t ,l ,j,t (cid:88) k ¯ d j,k (cid:2) (1 + (cid:15) ) u ,j,tk − (1 − (cid:15) ) l ,j,tk (cid:3) s.t u ,j,tk − l ,j,tk ≥ − [ s j,t ] k ∀ k, with dual variables u ,j,tk and l ,j,tk . Dropping the minimization leads to the ﬁnalreformulation: (cid:88) k ¯ d j,k (cid:2) (1 + (cid:15) ) u ,j,tk − (1 − (cid:15) ) l ,j,tk (cid:3) ≤ [ s j,t ] − m j,t s j ∀ t, j ∈ Ju ,j,tk − l ,j,tk ≥ − [ s j,t ] k ∀ j, t, k Similar analysis for the second inequality (Eqn. 22b) leads to reformulation (cid:88) k ¯ d j,k (cid:2) (1 + (cid:15) ) u ,j,tk − (1 − (cid:15) ) l ,j,tk (cid:3) ≤ s maxj − [ s j,t ] + m j,t · ( s j − s maxj ) ∀ t, j ∈ Ju ,j,tk − l ,j,tk ≥ [ s j,t ] k ∀ j, t, k, the third inequality (Eqn. 22c) yields (cid:88) k ¯ d j,k (cid:2) (1 + (cid:15) ) u ,j,tk − (1 − (cid:15) ) l ,j,tk (cid:3) ≤ [ s j,t ] − [ s j,t − ∆ t ] + m j,t s maxj ∀ t, j ∈ Ju ,j,tk − l ,j,tk ≥ [ s j,t − ∆ t ] k − [ s j,t ] k + x j,k,t ∀ j, t, k, (cid:88) k ¯ d j,k (cid:2) (1 + (cid:15) ) u ,j,tk − (1 − (cid:15) ) l ,j,tk (cid:3) ≤ − [ s j,t ] + [ s j,t − ∆ t ] ∀ t, j ∈ Ju ,j,tk − l ,j,tk ≥ − [ s j,t − ∆ t ] k + [ s j,t ] k − x j,k,t ∀ j, t, k. B Equivalence to deterministic optimization withmaximal parameters

Note that for convenience and readability the index j has been dropped in all equa-tions in this appendix.Consider the case where the cost, process model, and maintenance model inProblem 9 are not functions of the uncertain parameters ˜ d k : min x , m cost ( x , m ) (23a)s.t process model ( x , m ) (23b)maintenance model ( x , m ) (23c) m t s ≤ [ s t ] + (cid:88) [ s t ] k ˜ d k , ∀ t, ˜ d k ∈ U (23d) [ s t ] + (cid:88) [ s t ] k ˜ d k ≤ s max + m t ( s − s max ) , ∀ t, ˜ d k ∈ U (23e) [ s t ] + (cid:88) [ s t ] k ˜ d k ≥ [ s t − ] + (cid:88) [ s t − ] k ˜ d k + (cid:88) k x k,t ˜ d k + m t ( s − s max ) , ∀ t, ˜ d k ∈ U (23f) [ s t ] + (cid:88) [ s t ] k ˜ d k [ s t − ] + (cid:88) [ s t − ] k ˜ d k + (cid:88) k x k,t − ˜ d k , ∀ t, ˜ d k ∈ U (23g)Furthermore consider a deterministic version of Problem 23 min x , m cost ( x , m ) (24a)s.t process model ( x , m ) (24b)maintenance model ( x , m ) (24c) m t s ≤ s t , ∀ t (24d) s t ≤ s max + m t ( s − s max ) , ∀ t (24e)32 t ≥ s t − + (cid:88) k x k,t d maxk + m t ( s − s max ) , ∀ t (24f) s t ≤ s t − + (cid:88) k x k,t d maxk , ∀ t (24g)in which ˜ d k has been replaced by d maxk = max ˜ d k ∈U ˜ d k . Theorem 2.

Given that cost, process model, and maintenance model are not func-tions of ˜ d j,k and that s j ≤ s initj = s j,t = t ≤ s maxj and ˜ d j,k ≥ , ∀ ˜ d j,k ∈ U , thena feasible solution ( x = [ x k,t , . . . ] , m = [ m t , . . . ] , h = [ s t ]) to Problem 24 forms afeasible solution ( x = [ x k,t , . . . ] , m = [ m t , . . . ] , h = [[ s t ] , [ s t ] k ]) to Problem 23 with [ s t ] = (cid:40) s init t < t m, s t ≥ t m, (25a) [ s t ] k = t (cid:88) t (cid:48) = t m,t x k,t , (25b) where s init = s ( t = 0) , t m, is the ﬁrst point in time at which maintenance is per-formed, and t m,t is the most recent point in time at which maintenance was performed.Proof. First we show that the Inequality 23d m t s ≤ [ s t ] + (cid:88) [ s t ] k ˜ d k (23d)holds for any ˜ d k ≥ given ( x = [ x k,t , . . . ] , m = [ m t , . . . ] , h = [[ s t ] , [ s t ] k ]) : Fromthe assumption s ≤ s init ≤ s max and Eqn. 25a it follows that s ≤ [ s t ] ≤ s max .Furthermore, it directly follows from Eqn. 25b that [ s t ] k ≥ . Eqn. 23d is thereforeguaranteed to hold for any ˜ d k ∈ U , ˜ d k ≥ .Next, we show that Inequalities 23f and 23g, [ s t ] + (cid:88) [ s t ] k ˜ d k ≥ [ s t − ] + (cid:88) [ s t − ] k ˜ d k + (cid:88) k x k,t ˜ d k + m t ( s − s max ) (23f)and [ s t ] + (cid:88) [ s t ] k ˜ d k ≤ [ s t − ] + (cid:88) [ s t − ] k ˜ d k + (cid:88) k x k,t ˜ d k (23g)33espectively, hold for any ˜ d k ∈ U as long as Inequality 23e is satisﬁed. We ﬁrstassume that m t = 0 . In this case [ s t ] k = [ s t − ] k + x k,t (from Eqn. 25b), [ s t ] = [ s t − ] (from Eqn. 25a) and Eqns. 23f and 23g simplify to [ s t ] ≥ [ s t − ] and [ s t ] ≤ [ s t − ] , which is true for any ˜ d k . Next we assume that m t = 1 . In this case [ s t ] = s (fromEqn. 25a), [ s t ] k = 0 (from Eqn. 25b), and x k,t = 0 (assuming that a unit can not beoperated while maintenance is performed). Substituting this into Eqns. 23f and 23gand rearranging yields [ s t − ] + (cid:88) [ s t − ] k ˜ d k ≤ s max (26)and s ≤ [ s t − ] + (cid:88) [ s t − ] k ˜ d k (27)respectively. Eqn. 26 is guaranteed to be satisﬁed as long as Inequality 23e holds for t = t − and Eqn. 27 holds for any ˜ d k ∈ U , ˜ d k ≥ since [ s t − ] ≥ S and [ s t − ] k ≥ .Finally we show that the Inequality 23e [ s t ] + (cid:88) [ s t ] k ˜ d k ≤ s max + m t ( s − s max ) (23e)holds for any ˜ d k ∈ U . To this end we notice that arg max ˜ d k ∈U [ s t ] + (cid:88) k [ s t ] k ˜ d k = d maxk since [ s t ] k ≥ . Therefore, if Eqn. 23e holds for ˜ d k = d maxk , it holds for any ˜ d k ∈ U .Since ( x = [ x k,t , . . . ] , m = [ m t , . . . ] , h = [ s t ]) is a solution to Problem 24, s t ≤ s max + m t ( s − s max ) . (28)Lastly, noticing that the deﬁnition of [ s t ] and [ s t ] k (Eqns. 25a and 25b) ensure that s t can always be decomposed as s t = [ s t ] + (cid:88) k [ s t ] k d maxk if s t satisﬁes Problem 24, it follows that s t = [ s t ] + (cid:88) k [ s t ] k d maxk ≤ s max + m t ( s − s h max ) and Eqn. 23e holds for all ˜ d k ∈ U . 34 Instances of the STN

States Feed A Feed B Feed C Hot A Int. BC Int. AB Impure E Prod. 1 Prod. 2Capacity [kg] ∞ ∞ ∞

100 200 150 100 ∞ ∞

Initial [kg] ∞ ∞ ∞ v minj [kg] 40 32 20 80 v maxj [kg] 100 80 50 200 s maxj

80 150 160 100 s initj

30 50 120 40 τ j [hr] 15 21 24 15 c maintj

300 900 2000 1200 c fj p i,j,k ¯ d i,j,k /σ i,j,k p ¯ d/σ p ¯ d/σ p ¯ d/σ Heating Slow 9 1/0.27Normal 6 2/0.54Fast 3 3/0.81Reaction 1 Slow 27 4/1.08 30 4/1.08Normal 15 5/1.35 18 5/1.35Fast 8 8/2.43 12 10/2.7Reaction 2 Slow 36 1/0.27 33 2/0.54Normal 21 3/0.81 18 4/1.08Fast 15 5/1.35 12 4/1.08Reaction 3 Slow 30 3/0.81 24 2/0.54Normal 18 7/1.89 21 5/1.35Fast 6 8/2.16 12 9/2.43Separation Slow 15 2/0.54Normal 9 5/1.35Fast 6 6/1.62Period ScenarioLow Average HighProduct 1 Product 2 Product 1 Product 2 Product 1 Product 21 76 136 150 200 190 2942 116 162 88 150 231 3233 101 115 125 197 198 3074 91 141 67 296 217 3355 60 147 166 191 181 2936 60 103 203 193 244 3287 54 148 90 214 243 3268 110 113 224 294 182 2969 92 105 174 247 246 29610 99 175 126 313 189 34811 51 177 66 226 199 31912 117 164 119 121 222 34613 108 124 234 197 246 33114 64 107 64 242 239 33615 62 154 103 220 180 30216 62 135 77 342 216 30617 71 109 132 355 218 30218 86 139 186 320 213 34919 80 102 174 335 233 28420 70 172 239 298 191 34721 92 120 124 252 233 30322 59 153 194 222 181 30323 70 124 91 324 189 30724 75 141 228 337 188 299

Table 3: Instance P1 eed T1 S2 T2 S3 T3 S4 States S1 S2 S3 S4Capacity [kg] ∞ ∞ ∞ ∞

Initial [kg] ∞ v minj [kg] 0 0 0 0 0 v maxj [kg] 100 150 200 150 150 s maxj

120 100 150 90 80 s initj

10 70 45 60 30 t j [hr] 21 15 18 9 13 c maintj

600 600 500 400 400Task Mode UnitU1 U2 U3 U4 U5 p i,j,k ¯ d i,j,k /σ i,j,k p ¯ d/σ p ¯ d/σ p ¯ d/σ p ¯ d/σ T1 Slow 33 3/0.66 30 3/0.66T1 Normal 25 5/1.35 21 6/1.62T1 Fast 15 7/2.17 12 9/2.79T2 Slow 24 4/0.88T2 Normal 18 6/1.62T2 Fast 12 10/3.1T3 Slow 21 2/0.44 18 2/0.44T3 Normal 15 4/1.08 12 4/1.08T3 Fast 9 7/2.17 6 6/1.86Period Scenario Period ScenarioLow Average High Low Average High1 725 1167 2002 13 446 947 18472 587 1110 2141 14 201 1426 21783 397 1087 1668 15 305 1090 21594 558 906 1977 16 447 1040 20155 411 1188 1692 17 378 917 16626 678 1191 1805 18 566 1190 17827 252 1436 2007 19 409 953 16468 415 1020 2174 20 797 1109 21359 539 1110 1713 21 605 1298 211310 414 1266 2162 22 413 1275 176011 214 1042 2155 23 550 1364 195812 612 1169 2018 24 362 1158 1779

Table 4: Instance P2

361 T180% I3I120%T230%F2 I270% T460%40% P2T3 P1

States F1 F2 I1 I2 I3 P1 P2Capacity [kg] ∞ ∞

200 100 500 1000 1000Initial [kg] ∞ ∞ v minj [kg] 40 25 40 v maxj [kg] 80 50 80 s maxj

70 120 70 s initj

10 10 10 τ j [hr] 21 21 21 c maintj

600 600 600Task Mode UnitR1 R2 R3 p i,j,k ¯ d i,j,k /σ i,j,k p ¯ d/σ p ¯ d/σ T1 Slow 24 3/0.66 24 3/0.66T1 Normal 15 5/1.35 15 5/1.35T1 Fast 9 8/2.16 9 8/2.16T2 Slow 36 3/0.66 36 3/0.66T2 Normal 24 5/1.35 24 5/1.35T2 Fast 15 8/2.16 15 8/2.16T3 Slow 12 3/0.66T3 Normal 9 5/1.35T3 Fast 6 8/2.16T4 Slow 24 3/0.66T4 Normal 15 5/1.35T4 Fast 9 8/2.16Period Scenario Period ScenarioLow Average High Low Average HighP1 P2 P1 P2 P1 P2 P1 P2 P1 P2 P1 P21 190 102 271 231 311 305 13 197 192 213 280 314 3762 172 156 230 282 381 347 14 120 100 257 270 370 3923 102 103 274 226 381 321 15 180 115 286 223 335 3704 130 172 289 281 310 310 16 178 186 201 281 386 3585 130 104 270 212 317 371 17 135 138 288 289 398 3096 174 192 205 205 339 328 18 140 115 284 253 304 3967 167 194 260 248 379 348 19 128 104 200 273 334 3518 185 175 259 282 317 392 20 155 179 275 210 326 3009 179 180 211 233 300 387 21 158 120 298 253 315 39710 131 120 261 292 346 326 22 160 186 252 284 301 39611 104 196 271 212 364 364 23 125 105 223 220 396 37812 104 188 202 289 309 346 24 187 134 205 285 316 393

Table 5: Instance P4 States F1 F2 F3 I4 I5 I6 I7 I8 I9 P1 P2 P3 P4Capacity [kg] ∞ ∞ ∞ ∞ ∞ ∞ ∞

Initial [kg] ∞ ∞ ∞ v minj [kg] 0 0 0 0 0 0 v maxj [kg] 1000 2500 3500 1500 1000 4000 s maxj

100 100 100 100 100 100 s initj

77 80 90 17 40 33 τ j [hr] 21 21 21 21 21 21 c maintj p i,j,k ¯ d i,j,k /σ i,j,k p ¯ d/σ p ¯ d/σ p ¯ d/σ p ¯ d/σ p ¯ d/σ T1 Slow 18 3/0.66T1 Normal 12 5/1.35T2 Slow 24 3/0.66T2 Normal 15 5/1.35T3 Slow 33 5/0.66T3 Normal 21 8/1.35T4 Slow 42 5/0.66T4 Normal 21 11/1.35T5 Slow 45 7/0.66T5 Normal 30 10/1.35T6 Slow 18 3/0.66T6 Normal 12 5/1.35T7 Slow 33 6/0.66T7 Normal 21 9/1.35T8 Slow 45 6/0.66T8 Normal 30 10/1.35

Table 6: Instance P6 eriod ScenarioLow Average HighP1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P41 715 501 801 933 1277 1356 1739 1349 1605 1844 1898 16312 593 878 888 739 1533 1361 1384 1374 1687 1882 1650 17323 743 995 817 563 1727 1400 1323 1351 1510 1805 1893 16994 620 963 636 698 1702 1701 1260 1605 1557 1717 1693 19085 991 612 535 686 1521 1424 1600 1315 1929 1769 1657 19186 919 819 914 626 1451 1412 1667 1406 1539 1818 1676 15217 648 799 920 549 1447 1471 1732 1620 1999 1663 1632 16038 609 728 969 925 1746 1444 1694 1512 1673 1987 1924 17429 741 575 604 814 1691 1456 1384 1305 1826 1559 1982 188210 968 682 853 532 1436 1297 1564 1356 1987 1737 1598 191011 624 789 816 728 1357 1730 1441 1638 1696 1660 1761 177812 840 929 700 733 1384 1283 1461 1423 1779 1722 1558 165513 516 790 705 743 1387 1594 1696 1533 1596 1528 1857 174514 556 643 974 890 1722 1493 1528 1533 1782 1829 1994 151215 940 715 797 638 1470 1377 1635 1303 1949 1738 1933 178216 894 896 693 853 1563 1467 1526 1565 1837 1593 1938 185217 792 994 509 647 1561 1593 1614 1531 1694 1869 1879 159318 725 718 856 789 1739 1681 1373 1255 1902 1663 1814 195319 820 886 971 531 1576 1706 1635 1628 1875 1988 1648 151220 950 788 580 859 1627 1358 1469 1694 1642 1519 1999 159521 641 773 681 877 1257 1433 1581 1420 1890 1942 1854 182622 793 963 950 634 1250 1641 1644 1404 1795 1628 1658 196123 504 741 531 671 1472 1617 1311 1519 1967 1768 1877 173924 830 529 819 594 1390 1645 1632 1614 1844 1945 1620 1563 Table 7: Instance P6 continued391 HeatingI1 Reaction 270% P2P130%Reaction 140%F2 60% States F1 F2 I1 P1 P2Capacity [kg] ∞ ∞ ∞ ∞ ∞

Initial [kg] ∞ ∞ v minj [kg] 40 30 v maxj [kg] 100 140 s maxj

80 120 s initj

43 30 τ j [hr] 2 3 c maintj

300 300Task Mode UnitHeater Reactor p i,j,k ¯ d i,j,k /σ i,j,k p ¯ d/σ Heating Slow 9 5.5/1.49Heating Normal 6 11/2.97Reaction 1 Slow 6 7/1.89Reaction 1 Normal 4 9/2.43Reaction 2 Slow 10 5/1.35Reaction 2 Normal 6 13/3.51Period Scenario Period ScenarioLow Average High Low Average HighP1 P2 P1 P2 P1 P2 P1 P2 P1 P2 P1 P21 52 61 121 113 184 201 13 49 45 141 128 180 2252 60 32 137 149 228 207 14 29 44 115 120 210 1923 42 59 149 128 204 218 15 69 35 148 106 220 1924 51 40 146 127 192 200 16 53 51 125 148 224 2255 24 42 105 103 206 225 17 38 49 102 136 229 2186 34 52 111 100 194 212 18 47 77 112 125 204 2017 32 55 105 143 223 216 19 54 53 141 134 197 2268 30 62 141 140 182 218 20 26 77 118 131 223 2159 65 30 144 128 185 197 21 61 68 116 128 199 18310 32 69 147 122 218 180 22 31 71 132 101 212 19611 59 37 119 119 182 197 23 53 42 107 121 181 22212 24 44 133 139 218 195 24 39 47 113 143 220 211

Table 8: Toy instance40xcept for the toy instance, all instances are taken from the benchmark collec-tion by Lappas and Gounaris . All parameters were converted to a discrete timeformulation and degradation parameters were added. For Instance P1 parametersfrom Biondi et al. were used. The time steps and horizons used in each instanceare given in Table 9.Instance Toy P1 P2 P4 P6 T S

30 168 168 168 168 ∆ t S T P

720 4032 4032 4032 4032 ∆ t P

30 168 168 168 168Table 9: Time horizons of STN instancesThe degradation of all units is assumed to follow a Wiener process. The distri-bution of increments is S j,t + p i,j,k − S j,t = D i,j,k , D i,j,k ∼ N (cid:0) ¯ d i,j,k , σ i,j,k (cid:1) , where ¯ d i,j,k is the nominal amount of degradation when task i is performed on unit j in mode k . When no task is being processed the degradation signal is assumed tovary with zero mean and a small variance: S j,t +∆ t − S j,t = D , ∆ t , D , ∆ t ∼ N (cid:0) , . ∆ t (cid:1) . (29)A detailed list of all parameter values used in each case study can be found inTables 3 through 8. D Crossing probabilities of a Brownian motion fora piecewise linear boundary

When the Wiener process is used as a degradation model, the failure probability p fj can be calculated eﬃciently based on analytical result . The probability of aWiener process with piecewise constant parameters θ j,k = [ µ j,k , σ j,k ] crossing a ﬁxedthreshold s maxj is equivalent to the probability of a standard Brownian motion W ( t ) (a Wiener process with N (0 , distributed increments) crossing a piecewise linearboundary. The probability of W ( t ) crossing a linear boundary at + b is known to beinverse gaussian distributed P ( W ( t ) ≥ at + b, t ≤ T ) = 1 − Φ( aT + b √ T ) + exp − ab Φ( aT − b √ T ) , (30)41here Φ( · ) is the standard normal distribution function .Based on this, the probability of failure p u,vj between two consecutive maintenancetimes t m,u and t m,v can be calculated as p u,vj = 1 − E h ( y )= 1 − n (cid:89) l =1 ( y l > (cid:18) − exp (cid:20) − y l − y l t l − t l − (cid:21)(cid:19) , where t l , l ∈ { , . . . , n } are the n points in time between t m,u and t m,v at which theoperating mode k changes .Here y is a vector representing the values of W ( t ) at each t l . It is deﬁned as y = c + M D / u , where M is a lower triangular matrix of ones, D / = diag ( (cid:112) t − t m,u , √ t − t , . . . , (cid:112) t m,v − t n ) , u is a random vector with u l ∼ N (0 , σ j,k ( t l )) , and c is the piecwise linear boundary c = ( s maxj − s initj ) · [1 , . . . , (cid:62) − M diag (0 , µ j,k ( t ) , . . . , µ j,k ( t n )) ∆ t with ∆ t = [0 , t − t , . . . , t n − t n − ] (cid:62) . The overall probability of failure over the evaluation horizon T can then be cal-culated as p fj = 1 − n +1 (cid:89) u =1 (cid:16) − p u − ,lj (cid:17) , (31)where t m, = 0 , t m,n +1 = T , and t m,u , u ∈ { , . . . , n } are the n points in time at whichmaintenance is carried out on unit j in the evaluation horizon. E Nomenclature h health variables m maintenance variables x process variables Indices task j unit k operating mode s state t time Sets I set of tasks I j set of tasks i available on unit jI s set of tasks i consuming state s ¯ I s set of tasks i producing state sJ set of process units J i set of units j on which task i can be performed K i set of operating modes allowed for task iK j set of operating modes k available on unit jT evaluation horizon T P planning horizon T S scheduling horizon U uncertainty set X set of operating mode sequences x kj Discrete Variables m j,t if maintenance is performed on unit j at time tn i,j,k,t number of times task i is performed on unit j in mode k in timeperiod tw i,j,k,t if task i starts on unit j in mode k at time t , otherwise x j,k,t if unit j is operated in mode k at time t x kj sequence of operating modes [ k , k , . . . , k T ] ω j,k,t if unit j operates in mode k in period t , otherwise Continuous Variables a i,j,k,t amount of material processed by task i in unit j in mode k in timeperiod tb i,j,k,t amount of material committed to task i on unit j in mode k at time tc ∗ minimal cost determined by solving Problem 9 c fj cost of unit j failing D random variable modeling increment of S ( t ) N j,k random variable modeling n j,k j,k number of times mode k occurs on unit j in a given ∆ tp fj probability of failure ¯ p fj estimated upper bound on p fj q fins quantity of state s stored at end of planning horizon q s,t quantity of state s stored at time tS ( t ) stochastic process modeling s meas ( t ) s nj,t realization of S j ( t ) at time t s finj value of degradation signal at end of planning horizon s meas ( t ) measured degradation signal S t random variable modeling s meas ( t ) at time tX kj ( t ) memoryless Markov chain modeling x kj X kj,t state of Markov chain at time tφ ds slack variable for unfulﬁlled demand of state sφ qs,t slack variable for storage capacity violation of state s at time t ψ process/environmental parameters Parameters c maintj cost of maintenance for unit jc storages per unit cost of storage for state sc s storage capacity of state s D distribution of D ¯ d j,k nominal value of ˜ d j,k ˜ d j,k uncertain parameter modeling increment of S ( t ) d maxj,k maximum of ˜ d j,k in U N number of samples in Monte-Carlo simulation p i,j,k processing time of task i on unit j in mode kr j,t residual lifetime of unit j at time t [ s j,t ] , [ s j,t ] k parameters for aﬃne decision rule s reset value degradation signal s init initial value of degradation signal s max failure threshold degradation signal ¯ t P ﬁrst time period in planning horizon ¯ t S last time period in scheduling horizon ∆ t P length of planning period ∆ t S length of scheduling period U large number v maxi,j maximum batch size for task i on unit jv mini,j minimum batch size for task i on unit j size parameter of U δ s,t demand for s at time t(cid:15) j,k size parameter of U η j,k probability of k occuring n j,k times θ parameter vector of D µ j,k mean of D j,k π k,k ∗ transition probability Markov chain ¯ ρ i,s fraction of state s of material produced by task iρ i,s fraction of state s of material consumed by task iσ j,k standard deviation of D j,k τ j duration of maintenance on unit j References [1] Andrew K.S. Jardine, Daming Lin, and Dragan Banjevic. A review on ma-chinery diagnostics and prognostics implementing condition-based maintenance.

Mechanical Systems and Signal Processing , 20(7):1483–1510, 2006.[2] Diana Barraza-Barraza, Jorge Limón-Robles, and Mario G Beruvides. Oppor-tunities and challenges in Condition-Based Maintenance research.

IIE AnnualConference and Expo 2014 , pages 3035–3043, 2014.[3] Alexandros Bousdekis, Babis Magoutas, Dimitris Apostolou, and GregorisMentzas. Review, analysis and synthesis of prognostic-based decision supportmethods for condition based maintenance.

Journal of Intelligent Manufacturing ,pages 1–14, 2015.[4] Suzan Alaswad and Yisha Xiang. A review on condition-based maintenanceoptimization models for stochastically deteriorating system.

Reliability Engi-neering & System Safety , 157:54–63, 2017.[5] William Q. Meeker and Yili Hong. Reliability Meets Big Data: Opportunitiesand Challenges.

Quality Engineering , 26(1):102–116, 2014.[6] Ilias T. Dedopoulos and Nilay Shah. Optimal Short-Term Scheduling of Mainte-nance and Production for Multipurpose Plants.

Industrial & Engineering Chem-istry Research , 34(1):192–201, 1995.[7] I.T. Dedopoulos and N. Shah. Preventive maintenance policy optimization formultipurpose plant equipment.

Computers & Chemical Engineering , 19:693–698,1995. 458] Constantinos Georgiou Vassiliadis.

Integration of Maintenance Optimization inProcess Design and Operation under Uncertainty . PhD thesis, Imperial Collegeof Science, Technology and Medicine, 1999.[9] C.G. Vassiliadis and E.N. Pistikopoulos. Maintenance scheduling and processoptimization under uncertainty.

Computers and Chemical Engineering , 25(2-3):217–236, 2001.[10] J. Casas-Liza, J.M. Pinto, and L.G. Papageorgiou. Mixed Integer Optimizationfor Cyclic Scheduling of Multiproduct Plants Under Exponential PerformanceDecay.

Chemical Engineering Research and Design , 83(10):1208–1217, 2005.[11] Michael C. Georgiadis, Lazaros G. Papageorgiou, and Sandro Macchietto. Op-timal Cleaning Policies in Heat Exchanger Networks under Rapid Fouling.

In-dustrial & Engineering Chemistry Research , 39(2):441–454, 2000.[12] Songsong Liu, Ahmed Yahia, and Lazaros G. Papageorgiou. Optimal Produc-tion and Maintenance Planning of Biopharmaceutical Manufacturing under Per-formance Decay.

Industrial & Engineering Chemistry Research , 53(44):17075–17091, 2014.[13] Dionysios P. Xenos, Georgios M. Kopanos, Matteo Cicciotti, and Nina F. Thorn-hill. Operational optimization of networks of compressors considering condition-based maintenance.

Computers and Chemical Engineering , 84:117–131, 2016.[14] Nur I. Zulkaﬂi and Georgios M. Kopanos. Planning of production and utility sys-tems under unit performance degradation and alternative resource-constrainedcleaning policies.

Applied Energy , 183:577–602, 2016.[15] Nur I. Zulkaﬂi and Georgios M. Kopanos. Integrated condition-based planningof production and utility systems under uncertainty.

Journal of Cleaner Pro-duction , 167:776–805, 2017.[16] Adrian M. Aguirre and Lazaros G. Papageorgiou. Medium-term optimization-based approach for the integration of production planning, scheduling and main-tenance.

Computers and Chemical Engineering , 0:1–21, 2018.[17] Sreekanth Rajagopalan, Nikolaos V. Sahinidis, Satyajith Amaran, Anshul Agar-wal, Scott J. Bury, Bikram Sharda, and John M. Wassick. Risk analysis ofturnaround reschedule planning in integrated chemical sites.

Computers &Chemical Engineering , 107:381–394, 2017.4618] Matteo Biondi, Guido Sand, and Iiro Harjunkoski. Optimization of multipur-pose process plant operations: A multi-time-scale maintenance and productionscheduling approach.

Computers and Chemical Engineering , 99:325–339, 2017.[19] E. Kondili, C.C. Pantelides, and R.W.H. Sargent. A general algorithm for short-term scheduling of batch operations - I. MILP formulation.

Computers andChemical Engineering , 17(2):211–227, 1993.[20] Murat Yildirim, Xu Andy Sun, and Nagi Z Gebraeel. Sensor-Driven Condition-Based Generator Maintenance Scheduling - Part I: Maintenance Problem.

IEEETransactions on Power Systems , 31(6):4253–4262, 2016.[21] Murat Yildirim, Xu Andy Sun, and Nagi Z Gebraeel. Sensor-Driven Condition-Based Generator Maintenance Scheduling - Part II: Incorporating Operations.

IEEE Transactions on Power Systems , 31(6):4263–4271, 2016.[22] Murat Yildirim, Nagi Z. Gebraeel, and Xu Andy Sun. Integrated PredictiveAnalytics and Optimization for Opportunistic Maintenance and Operations inWind Farms.

IEEE Transactions on Power Systems , 32(6):4319–4328, 2017.[23] Beste Başçiftci, Shabbir Ahmed, Nagi Z. Gebraeel, and Murat Yildirim. Stochas-tic Optimization of Maintenance and Operations Schedules under UnexpectedFailures.

IEEE Transactions on Power Systems , 8950(c):1–1, 2018.[24] Adriaen Verheyleweghen and Johannes Jäschke. Framework for Combined Di-agnostics, Prognostics and Optimal Operation of a Subsea Gas CompressionSystem.

IFAC-PapersOnLine , 50(1):15916–15921, 2017.[25] Nikolaos H Lappas and Chrysanthos E Gounaris. Multi-stage adjustable robustoptimization for process scheduling under uncertainty.

AIChE Journal , 62(5):1646–1667, 2016.[26] Iftekhar A. Karimi and Conor M. McDonald. Planning and Scheduling of Par-allel Semicontinuous Processes. 2. Short-Term Scheduling.

Industrial & Engi-neering Chemistry Research , 36(7):2701–2714, 1997.[27] Christos T. Maravelias and Ignacio E. Grossmann. New general continuous-time state - Task network formulation for short-term scheduling of multipurposebatch plants.

Industrial & Engineering Chemistry Research , 42(13):3056–3074,2003. 4728] M. G. Ierapetritou and C. A. Floudas. Eﬀective continuous-time formulationfor short-term scheduling. 1. Multipurpose batch processes.

Industrial & Engi-neering Chemistry Research , 37(11):4341–4359, 1998.[29] Peng Wang and David Coit. Reliability and Degradation Modeling with Randomor Uncertain Failure Threshold. In , pages 392–397. IEEE, 2007.[30] Laurent Doyen and Olivier Gaudoin. Classes of imperfect repair models basedon reduction of failure intensity or virtual age.

Reliability Engineering & SystemSafety , 84(1):45–56, 2004.[31] David Applebaum. Lévy processes-from probability to ﬁnance and quantumgroups.

Notices of the American Mathematical Society , 51(11):1336–1347, 2004.[32] Zhi-Sheng Ye and Min Xie. Stochastic modelling and analysis of degradation forhighly reliable products.

Applied Stochastic Models in Business and Industry ,31(1):16–32, 2015.[33] Xiao-Sheng Si, Wenbin Wang, Chang-Hua Hu, and Dong-Hua Zhou. Remain-ing useful life estimation - A review on the statistical data driven approaches.

European Journal of Operational Research , 213(1):1–14, 2011.[34] Khanh T.P. Nguyen, Mitra Fouladirad, and Antoine Grall. Model selection fordegradation modeling and prognosis with health monitoring data.

ReliabilityEngineering & System Safety , 169(August 2017):105–116, 2018.[35] Haitao Liao and Zhigang Tian. A framework for predicting the remaining usefullife of a single unit under time-varying operating conditions.

IIE Transactions ,45(9):964–980, 2013.[36] Qi Li, Zhanbao Gao, Diyin Tang, and Baoan Li. Remaining useful life esti-mation for deteriorating systems with time-varying operational conditions andcondition-speciﬁc failure zones.

Chinese Journal of Aeronautics , 29(3):662–674,2016.[37] Julien Chevallier and Stéphane Goutte. On the estimation of regime-switchingLévy models.

Studies in Nonlinear Dynamics and Econometrics , 21(1):3–29,2017. 4838] Nagi Z. Gebraeel, Mark A. Lawley, Rong Li, and Jennifer K. Ryan. Residual-lifedistributions from component degradation signals: A Bayesian approach.

IIETransactions , 37(6):543–557, 2005.[39] Linkan Bian and Nagi Gebraeel. Computing and updating the ﬁrst-passagetime distribution for randomly evolving degradation signals.

IIE Transactions ,44(11):974–987, 2012.[40] Nagi Gebraeel and Jing Pan. Prognostic degradation models for computingand updating residual life distributions in a time-varying environment.

IEEETransactions on Reliability , 57(4):539–550, 2008.[41] Chao Ning and Fengqi You. A data-driven multistage adaptive robust optimiza-tion framework for planning and scheduling under uncertainty.

AIChE Journal ,63(10):4343–4369, 2017.[42] Yannis A. Guzman, Logan R. Matthews, and Christodoulos A. Floudas. New apriori and a posteriori probabilistic bounds for robust counterpart optimization:I. Unknown probability distributions.

Computers & Chemical Engineering , 84:568–598, 2016.[43] Zukui Li, Ran Ding, and Christodoulos Floudas. A Comparative Theoretical andComputational Study on Robust Counterpart Optimization: I. Robust LinearOptimization and Robust Mixed Integer Linear Optimization.

Industrial &Engineering Chemistry Research , 50(18):10567–10603, 2011.[44] Klaus; Pötzelberger and Liqun Wang. Boundary Crossing Probability for Brow-nian Motion and General Boundaries.

Journal of Applied Probability , 34(1):54–65, 1997.[45] Linkan Bian and Nagi Gebraeel. A stochastic methodology for prognostics undertime-varying environmental future proﬁles. In

Proceedings of the 2011 Confer-ence on Intelligent Data Understanding, CIDU 2011 , 2011.[46] Lothar Breuer. Occupation times for Markov-modulated Brownian motion.

Journal of Applied Probability , 49(2):549–565, 2012.[47] Souleyman Ozekici. Optimal maintenance policies in random environments.

European Journal of Operational Research , 82(2):283–294, 1995.4948] Lewis Paton, Matthias C M Troﬀaes, Nigel Boatman, Mohamud Hussein, andAndy Hart. Multinomial Logistic Regression on Markov Chains for Crop Ro-tation Modelling.

Information Processing and Management of Uncertainty inKnowledge-Based Systems , pages 476–485, 2014.[49] Narayan Chanra Sinha, M. Ataharul Islam, and Kazi Saleh Ahamed. LogisticRegression Models for Higher Order Transition Probabilities of Markov Chainfor Analyzing the Occurrences of Daily Rainfall Data.

Journal of Modern AppliedStatistical Methods , 10(1):337–348, 2011.[50] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, RonWeiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Courna-peau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning in Python.

Journal of Machine Learning Research , 12:2825–2830, 2012.[51] Cristian Bravo, Gaston L’Huillier, Jose Luis Lobato, and Richard Weber. Prob-ability Estimation for Multiclass Problems Combining SVMs and Neural Net-works.

Neural Network World , 20(4):475–489, 2010.[52] Stephan Dreiseitl and Lucila Ohno-Machado. Logistic regression and artiﬁ-cial neural network classiﬁcation models: A methodology review.

Journal ofBiomedical Informatics , 35(5-6):352–359, 2002.[53] Zhuangzhi Li and Zukui Li. Chance constrained planning and scheduling underuncertainty using robust optimization approximation.

IFAC-PapersOnLine , 28(8):1156–1161, 2015.[54] Zhuangzhi Li and Zukui Li. Optimal robust optimization approximation forchance constrained optimization problem.

Computers and Chemical Engineer-ing , 74:89–99, 2015.[55] Donald R Jones, Matthias Schonlau, and William J Welch. Eﬃcient Global Op-timization of Expensive Black-Box Functions.

Journal of Global Optimization ,13:455–492, 1998.[56] William E. Hart, Carl D. Laird, Jean-Paul Watson, David L. Woodruﬀ,Gabriel A. Hackebeil, Bethany L. Nicholson, and John D. Siirola.

Pyomo –Optimization Modeling in Python , volume 67. 2017.5057] William E. Hart, Jean Paul Watson, and David L. Woodruﬀ. Pyomo: Modelingand solving mathematical programs in Python.

Mathematical ProgrammingComputation , 3(3):219–260, 2011.[58] Johannes Wiebe. STN with degradation. DOI: 10.5281/zenodo.1313718, 2018.[59] Miten Mistry, Andrea Callia D’Iddio, Michael Huth, and Ruth Misener. Satisﬁa-bility modulo theories for process systems engineering.

Computers and ChemicalEngineering , 113:98–114, 2018.[60] Andre A. Ciré, Elvin Çoban, and John N. Hooker. Logic-based Benders de-composition for planning and scheduling: A computational analysis.

KnowledgeEngineering Review , 31(5):440–451, 2016.[61] D. Letsios and R. Misener. Exact Lexicographic Scheduling and ApproximateRescheduling.

ArXiv e-prints , arXiv:1805.03437, 2018.[62] Zukui Li, Qiuhua Tang, and Christodoulos A. Floudas. A Comparative The-oretical and Computational Study on Robust Counterpart Optimization: II.Probabilistic Guarantees on Constraint Satisfaction.

Industrial & EngineeringChemistry Research , 51(19):6769–6788, 2012.[63] Linkan Bian and Nagi Gebraeel. Stochastic methodology for prognostics un-der continuously varying environmental proﬁles.

Statistical Analysis and DataMining , 6(3):260–270, 2013.[64] David Siegmund. Boundary Crossing Probabilities and Statistical Applications.