# Improving non-deterministic uncertainty modelling in Industry 4.0 scheduling

IImproving non-deterministic uncertainty modellingin Industry 4.0 scheduling

Ashwin Misra

The Robotics InsituteCarnegie Mellon UniversityPittsburgh, PA [email protected]

Ankit Mittal

Maruti Sizuki IndiaGurugram, India [email protected]

Vihaan Misra

Netaji Subhas University of TechnologyNew Delhi, India [email protected]

Deepanshu Pandey

ZS AssociatesGurgaon, India [email protected]

Abstract

The latest Industrial revolution has helped industries in achieving very high rates ofproductivity and efﬁciency. It has introduced data aggregation and cyber-physicalsystems to optimize planning and scheduling. Although, uncertainty in the environ-ment and the imprecise nature of human operators are not accurately consideredfor into the decision making process. This leads to delays in consignments andimprecise budget estimations. This widespread practice in the industrial models isﬂawed and requires rectiﬁcation. Various other articles have approached to solvethis problem through stochastic or fuzzy set model methods. This paper presents acomprehensive method to logically and realistically quantify the non-deterministicuncertainty through probabilistic uncertainty modelling. This method is applica-ble on virtually all Industrial data sets, as the model is self adjusting and usesepsilon-contamination to cater to limited or incomplete data sets. The results arenumerically validated through an Industrial data set in Flanders, Belgium. Thedata driven results achieved through this robust scheduling method illustrate theimprovement in performance.

Despite the onset of Industry 4.0 across various Manufacturing plants, many areas of this relativelynewer framework needs improvement to enhance it’s reliability and accuracy. Production planningcombines and coordinates all the manufacturing activities; it broadly consists of three components-planning, controlling and dispatching. The planning process refers to the pre-manufacturing task ofthe objectives and targets with respect to the available resources and constraints. Control refers to thecontinuous evaluation of the performance, according to the standards set by planning. Hence, the aimof production control is to make sure that the consignment is produced at the optimum quality, timeand quantity with cost-effective methods. In Production Planning, Scheduling refers to the part ofplanning which is concerned with the schedule of an activity; the start time and the ﬁnish time. Thispaper aims to develop a robust scheduling strategy to improve the scheduling aspect of productionplanning.The main reason, the scheduling of a consignment is inconsistent or deviates from the standardbehaviour is due to various non-deterministic reasons such as human operator error, machine faults,delays in supply etc. The current planning algorithms do not capture these intrinsic non-deterministic

Preprint. Under review. a r X i v : . [ s t a t . O T ] J a n ncertainties which result in poor performance during execution. This leads to losses in capital, timeand resources. The method discussed in this paper is inﬂuenced by this problem. These uncertaintiesare more accurately quantiﬁed to develop an improved robust scheduling strategy.This paper provides a general outline to more realistically quantify operational non-deterministicuncertainty which is conventionally different from the general practices currently employed in theindustry.The general practices assume it to be a deterministic system, which leads to such faultypredictions[1]. It introduces the uncertainty models used- P-box and epsilon-contamination models,brieﬂy describe the Industrial data set and explain how to model on such data. A numerical exampleis introduced for a speciﬁc case which would enhance the understandability of the project. During the last years, a lot of academic research has focused on the topic of production planningand scheduling. The authors [2] attribute the operational uncertainty to software that are ﬁxated onideal schedules. They suggest human operator intervention from the production planner to the taskscheduler- depending on the frequency and severity of scheduling errors. These respondents areresponsible for rescheduling and identifying the optimal path of action. This method relies on humanintervention to counter the planning faults due to non-deterministic uncertainty. However, this is onlyapplicable to small production plants and does not factor in the human operator errors.The authors[3] have focused on reviewing methods to scheduling under uncertainty including robust,stochastic and fuzzy scheduling. They propose a baseline schedule which is planned before themanufacturing operation. Reactive scheduling re-optimizes this schedule dynamically according tothe uncertainties. Hence it depends on a stochastic assessment of the uncertainty, from which a set ofdecisions are developed.

Generally, in such systems, the uncertainty is either quantiﬁed as probabilistic[4] or fuzzy setmodels[5]. Typically, stochastic methods are used with a combination of different probabilitydistribution models to represent various phenomenons. These models are however unrealistic dueto the fact that interval-based uncertainties are not possible to quantify/model with ﬁxed parameterprobability distributions. They fail to take into account the variations in uncertainty such as thevariation in any initial probability distributions such as median value shifts. Such models arealso very complex to build and require ﬁeld experts to deﬁne and generate. As shown by [5],classic probabilistic models fail to capture the whole non-deterministic nature of uncertainties inmanufacturing processes. After analysis of a large Industrial data set, it was concluded that probabilitybox models are much more accurate for manufacturing plants. Epsilon contamination is a methodto capture uncertainty through e-contamination classes. It is a bayesian method, which is used indata sets which has corrupted, incomplete or insufﬁcient data. It is used in Robust statistics and themethodology discusses uses this as one of its components.

A P-box or a probability box is the area circumscribed within an upper and lower cumulativedistribution function to characterize an imprecise distribution. The most signiﬁcant beneﬁt of a p-boxis that it captures this non-deterministic uncertainty within a conﬁned group very well in juxtapositionto Probability distribution functions themselves. For a more extensive data set, set are divided into asubset, and PDFs are computed, then PDFs are integrated to construct a bounded set of CDF. Thenon-deterministic likelihood of the technique or parameters affected by this new unpredictable changeis represented by this set of CDFs. F ( x ) = P ( X < = x ) (1)A Cumulative Frequency distribution represents the cumulative probability of a random variable fromnegative inﬁnity up to a random variable X. By deﬁnition, F is a non-decreasing function with a rangeof [0 , over the domain of [ −∞ , ∞ ] f ( x ) = P r [ X = x ] (2)2igure 1: Sample p-boxThe PDF is the derivative of the CDF if it exists. A generalised probability box, or generalised p-box,is a pair ( F , F ) of distribution functions from Ω to [0, 1] following F < = F . If Ω is a closed intervalon R, then we call ( F , F ) a p-box. When both F , F assume general number of different values only,It can be said that the generalised p-box is discrete. F X ( x ) and F X ( x ) are respectively the lower andupper cumulative probability bounds, and F x ( x ) is non-decreasing with x, F X ( x ) = P ( X < = x ) , F X ( x ) = P ( X < = x ) As a function of unexplained unpredictable disturbances to speciﬁc techniques or parameters, p-boxesare a valuable method for catching non-deterministic uncertainties. Let’s consider a particular taskperformed by 2 workers, and the time taken to complete the job is recorded for a year. Using thedata, respective PDFs and CDFs are plotted. Then from the resulting CDFs, it is computed thatwhich worker works better than others for the particular task. But the result did not account forthe non-deterministic uncertainty parameters such as mood or the worker’s distractions. It may bepossible that these uncertainties affect the worker timing and due to which one CDF has a highervariation than others. To capture these non-deterministic uncertainties, the p-box is the most suitabletool. To plot the p-box data set is divided into subsets; upper and lower bounded CDFs are plotted.P-box captures the uncertainty in the probability of duration task completion. This enables the systemfor more robust task assignments for any future event accommodating non-deterministic delays.

The epsilon contamination model P ( f ) = (1 − (cid:15) ) P ( f ) + (cid:15)Q ( f ) (3)In this model[7], the probability epsilon ( (cid:15) ) donates the probability that the data set is contaminatedby a distribution Q. Let P be the distribution of the sequence i.e. a function of the measures of centraltendency. This model provided by Huber proposes a robust statistical framework by establishingpriors. It is a convex combination of two distinct uncertainty models- 1. Vacuous Model and 2.Probabilistic model. Epsilon, is a parameter assigned to the trust on the model.Both of these models rely on the principle of imprecise belief by [6], the intervals of the p-box andepsilon contamination models quantify uncertainty accurately and should be incorporated in thescheduling algorithms. This belief can be attributed to an operators experience and his expertise in aparticular task, i.e. assigning a particular factor to replicate the belief of the process. In this paper, thespeciﬁc distributions used are cumulative distribution functions. This section will explain how to model a particular industrial process through the uncertainty modelsand how to interpret the model to achieve a more accurate scheduling and robust planning autonomous3ramework. The modelling technique is divided into two parts- For data with limited informationand for complete data. In this numerical case, a critical point of n=25 samples is taken. For theincomplete data set, the epsilon-contamination model is applied with a belief measure. P-boxes areapplied on the complete data sets as they have restricted bounds due to the complete information.The database obtained from the industry needs to be pre-processed before entering into this tool if itis unstructured. As the main objective of this research is to optimize the scheduling. In our example,The attributes of the database which exist for operation time, operator ID, task Sequence and Operatorskill are taken. Any similar attributes pointing to similar sets of attributes should be identiﬁed andtaken according to different Industries.For each task undertook in the industry, various operational parameters are considered. Everysequence, operator and season corresponding to a particular task. For every season, cumulativedistribution functions of every operator executing a task is plotted.The error times are calculated for every operator in the different operations and season timings. ε = P redictedT ime − ObservedT ime (4)This error is equal to the Actual time recorded minus the predicted time by the IoT-predictionsoftware. Cumulative distribution functions and probability density functions are plotted for each.After analyzing the graphs of the cumulative distribution functions for very operator per operationsequence. Now the curves are of two types-First, In which the data points are sufﬁcient to plot probability box for the operator, Second, Lessthan sufﬁcient data points to form a probability box. In the second case, the epsilon contaminationmodel proves to be a useful analytical tool.In the ﬁrst case, with sufﬁcient data points, the uncertainty is quantiﬁed with p-box models as shownin the ﬁgure. The numerical example of SeqID786 shows the CDF of various operators executingthe same operation, The maximum and minimum bounds of this ﬁgure gives the upper and lowerbounds/previsions of the p-box. This ﬁgure denotes the uncertainty of a speciﬁc operator using upperand lower previsions. The area bound between the upper and lower previsions denotes the uncertaintyor error in the task execution. Similar graphs are plotted for all operators to study their error patternsacross various seasons, tasks and operation parameters. The upper bound of this curve denotes theupper prevision and the lower bound denotes the lower prevision.Figure 2: p-box of sample sequenceID 786For the second case, for lesser data points or the operators who are on a temporary basis or havelimited responsibilities. Epsilon contamination is a very new research topic which focuses on a smallregion and generalizes for the whole range. The trust factor is taken as 0.8. In the given example,seqID 787 is taken which had only three operators working on this sequence and only 9 observations.To capture the non-determinism in this data, epsilon contamination samples the distribution to formupper and lower previsions according to a trust factor (cid:15) .The upper and lower previsions are calculated using this formula and their difference denotes thedegree of uncertainty in the data. In this formula, the trust factor is denoted by epsilon, which is4ssigned by an Industry expert/supervisor on the probability of the task being completed in time.The contaminated model means the lower and upper previsions are calculated from the averagedistribution of the operators in a sequence with an individual distribution contaminant.Figure 3: Epsilon Contamination model, P=Upper Prevision, C=Lower Prevision, L=Mean previsioncurveThe Second step in the process after quantifying the uncertainty is to improve the prediction model ofthe Industry 4.0 framework. The current task completion time prediction used is not accurate anddiffers greatly from the real time recorded. This model can be improved using dynamic MachineLearning; ε = f ( estimatedproductiontime, trainingdata ) (5)And the predicted estimated time according to the new model is N ewestimatedtime = f + ε (6)To give an insight, in the paper [3], a Rational Quadratic Gaussian method Regression Model waschosen when a lot of models were analyzed on this knowledge, the model was trained stochasticallyon the past estimated and real production times to estimate the assembly time with decreased error.The training data is used for an year of operation with the exception of operators who have very lessdata points. The model is explained as follows- ε = N ( f ( x, b ) , a ) (7)The model has to be suited to different data sets. The numerical example have employed this withparameters b = 7 . x with a = 4 . x . The result section compares the recomputed p-boxwhich shows that this learning model improved the prediction times are reduced the error times by anappreciable margin.The ﬁnal step is consolidating all these features into an easy-to-use GUI. This will increase theusability of the research results and will lead to wider adaptability to different industries. Anapplication can be developed via MATLAB or python to incorporate all such modelling techniquessuggested in this paper and link it with a regression learner to automate the whole process. The bestoperator suggestion is made by calculating the degree of uncertainty i.e. the difference of the upperand lower previsions. The predicted time of completion of operation is also shown using the newdeveloped model. 5 Results

Figure 4: P-box comparison, before and after trainingThe results achieved through the adoption of this process by the facility are very substantial. Theindustry has reported more accurate prediction times and have helped in estimating the delivery timefor different operations. After, the machine learning process the P-boxes are again computed whichhighlight that the error times are reduced. A reduction is e.g., observed from Original = . units toTrainedModel = . for the same operation as per normalised area. As the area under the pox-describethe variance in the error values, this shows that the predicted times are now closer to the actual timesobserved. This research has also helped to rank the employees on their performance and create a newassignment schedule for different operations and operators according to different seasons. To conclude, this article provides a new perspective to data optimization of Industry 4.0. This articlebridges the gap between pure mathematics and application and proposes a novel applied mathematicsapproach to quantify non-deterministic uncertainty. General practices assume determinism which isan inherent ﬂaw in the procedure and this paper proposes a dynamic approach to solve this problem.

References [1] Hendrik, V.B., & Paul, V., (2017). “Design of holonic manufacturing systems,” Journal of MachineEngineering, vol. 17/3.[2] Snoo, C., Wezel, W., Wortmann, H., & Gaalman, G. (2011). Coordination activities of human plannersduring rescheduling: Case analysis and event handling procedure. International Journal of Production Research,49, 2101–2122. doi:10.1080/00207541003639626[3] Herroelen, W. & R. Leus, (2005). “Project scheduling under uncertainty: Survey and research potentials.”Eur. J. Oper. Res. 165: 289-306.[4]Knyazeva, M., Bozhenyuk, A., & Rozenberg, I., (2015). Resource-constrained project scheduling approachunder fuzzy conditions. Procedia Computer Science, 77, 56–64. doi:10.1016/j.procs.2015.12.359[5] Shariatmadar, K., Misra, A., Debrouwere, F., & Versteyhe, M. (2019). Optimal modelling of processvariations in industry 4.0 facility under advanced p-box uncertainty. IEEE Student Conference on Research andDevelopment (SCOReD), Bandar Seri Iskandar, Malaysia, 2019, pp. 180–185[6] Walley, P. (1991). Statistical reasoning with imprecise probabilities. Taylor & Francis.[7]Chen, M., Gao, C. & Ren, Z., (2015). A General Decision Theory for Huber’s (cid:15) -Contamination Model.Electronic Journal of Statistics. 10. 10.1214/16-EJS1216.-Contamination Model.Electronic Journal of Statistics. 10. 10.1214/16-EJS1216.