[PDF] Counterfactual Contextual Multi-Armed Bandit: a Real-World Application to Diagnose Apple Diseases

Abstract

Post-harvest diseases of apple are one of the major issues in the economical sector of apple production, causing severe economical losses to producers. Thus, we developed DSSApple, a picture-based decision support system able to help users in the diagnosis of apple diseases. Specifically, this paper addresses the problem of sequentially optimizing for the best diagnosis, leveraging past interactions with the system and their contextual information (i.e. the evidence provided by the users). The problem of learning an online model while optimizing for its outcome is commonly addressed in the literature through a stochastic active learning paradigm - i.e. Contextual Multi-Armed Bandit (CMAB). This methodology interactively updates the decision model considering the success of each past interaction with respect to the context provided in each round. However, this information is very often partial and inadequate to handle such complex decision making problems. On the other hand, human decisions implicitly include unobserved factors (referred in the literature as unobserved confounders) that significantly contribute to the human's final decision. In this paper, we take advantage of the information embedded in the observed human decisions to marginalize confounding factors and improve the capability of the CMAB model to identify the correct diagnosis. Specifically, we propose a Counterfactual Contextual Multi-Armed Bandit, a model based on the causal concept of counterfactual. The proposed model is validated with offline experiments based on data collected through a large user study on the application. The results prove that our model is able to outperform both traditional CMAB algorithms and observed user decisions, in real-world tasks of predicting the correct apple disease.

Full PDF

CCounterfactual Contextual Multi-Armed Bandit: a Real-World Application toDiagnose Apple Diseases

GABRIELE SOTTOCORNOLA ∗ , Free University of Bozen-Bolzano, Italy

FABIO STELLA,

University of Milano-Bicocca, Italy

MARKUS ZANKER,

Free University of Bozen-Bolzano, Italy

Post-harvest diseases of apple are one of the major issues in the economical sector of apple production, causing severe economicallosses to producers. Thus, we developed

DSSApple , a picture-based decision support system able to help users in the diagnosisof apple diseases. Specifically, this paper addresses the problem of sequentially optimizing for the best diagnosis, leveraging pastinteractions with the system and their contextual information (i.e. the evidence provided by the users). The problem of learning anonline model while optimizing for its outcome is commonly addressed in the literature through a stochastic active learning paradigm -i.e.

Contextual Multi-Armed Bandit (CMAB) . This methodology interactively updates the decision model considering the success of eachpast interaction with respect to the context provided in each round. However, this information is very often partial and inadequate tohandle such complex decision making problems. On the other hand, human decisions implicitly include unobserved factors (referredin the literature as unobserved confounders ) that significantly contribute to the human’s final decision. In this paper, we take advantageof the information embedded in the observed human decisions to marginalize confounding factors and improve the capability of theCMAB model to identify the correct diagnosis. Specifically, we propose a

Counterfactual Contextual Multi-Armed Bandit , a model basedon the causal concept of counterfactual. The proposed model is validated with offline experiments based on data collected through alarge user study on the application. The results prove that our model is able to outperform both traditional CMAB algorithms andobserved user decisions, in real-world tasks of predicting the correct apple disease.CCS Concepts: •

Computing methodologies → Online learning settings ; •

Applied computing → Health informatics ; •

Informationsystems → Information systems applications .Additional Key Words and Phrases: contextual multi-armed bandit, counterfactual causal inference, real-world application, intelligentsystem in agriculture

ACM Reference Format:

Gabriele Sottocornola, Fabio Stella, and Markus Zanker. 2021. Counterfactual Contextual Multi-Armed Bandit: a Real-World Applicationto Diagnose Apple Diseases. 1, 1 (February 2021), 10 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

The domesticated apple (

Malus x domestica ) is the third most produced fruit in the world (behind bananas andwatermelons) with an amount of more than 86 million metric tons in 2018 [19]. In 2019, annual worldwide sales ∗ Corresponding author.Authors’ addresses: Gabriele Sottocornola, [email protected], Free University of Bozen-Bolzano, Bolzano, Italy; Fabio Stella, [email protected],University of Milano-Bicocca, Milano, Italy; Markus Zanker, [email protected], Free University of Bozen-Bolzano, Bolzano, Italy.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . L G ] F e b Sottocornola et al.of apples were valued at around 7 billion USD [24]. Apple trees are indeed the most common temperate fruit treespecies, since their fruits can be stored for prolonged periods of time under controlled atmosphere conditions. However,physiological disorders and pathogenic microorganisms can deteriorate the quality and quantity of the productionduring storage, and lead to considerable economic losses [22]. For example, in Northern Europe, storage losses dueto pathogenic microorganisms were estimated to reach up to 10% in integrated production and up to 30% in organicproduction [14]. Therefore, an effective Decision Support System (DSS), that we referred as

DSSApple , able to timelydiagnose occurring diseases and disorders on harvested apples is crucially important for this agricultural sector. Forinstance, it depends on the exact pathogen species to decide on the right strategy for immediate damage containmentand/or to recommend a plant protection scheme for the following year. In order to reliably determine the nature ofthe disease, several macroscopic symptoms, such as appearance, color, texture and consistency of the rot need to beconsidered. Hence, the system should provide a practical interface to elicit user knowledge on the symptoms shemight notice on a diseased apple, and a reasoning system to adapt the diagnosis to the user feedback. Nevertheless,the capability of the system to distinguish different candidate diseases relies on subtle differences in the symptoms’appearance that a non-expert user (or even an expert one) may failed to identify. Thus, another important feature forDSSApple is to correct user (imprecise) feedback in order to increase the accuracy of the suggested diagnosis.In this work, we therefore investigate a component of such a DSS, namely, a method to interactively predict the bestdiagnosis, by leveraging past user interactions with a sequence of infected apples. The proposed model is based on theframework of Contextual Multi-Armed Bandit (CMAB) and on counterfactual causal inference. The former enablesto process the observed sequential interactions of users with the diagnostic tool, the latter allows to extrapolate thebias in human decision making (the so-called unobserved confounders ) and improve the online diagnostic model. Themain contribution of this paper is to empirically assess the performance of the causal approach in a CMAB scenario, forthe task of apple disease classification. Specifically, we base our work on the theoretical results presented so far in theliterature [2, 4, 11], in order to implement a novel

Counterfactual CMAB algorithm, called

Counterfactual ThompsonSampling (CF-TS) , tailored to our problem. We demonstrate the effectiveness of the proposed algorithm by testing it onreal-world data related to the diagnosis of post-harvest apple diseases. The offline evaluation was derived from a largeuser study, in which we collected user interactions with DSSApple. The participants of the study were asked to diagnosea set of diseased apples images, crafted by domain experts in the laboratory, by using DSSApple application. Thesimulation derived from this data, showed that CF-TS significantly outperforms other decision models, i.e. observationaland CMAB models, in the task of sequentially diagnosing the correct apple disease.This work represents a further step in the development of an effective and practical DSS that aim at helping expertand non-expert user in the diagnosis of post-harvest diseases of apple. In particular, the fundamental insight gained inthis research is that an active learning classifier based on a counterfactual CMAB algorithm is able to outperform boththe observed user decision model and a state-of-the-art CMAB policy. The performances of the improved diagnostictool have been proved to reach an accuracy greater than 40% in a real-world setting with 5 possible diseases, justleveraging users’ interactions with the application showing pictures of macroscopic symptoms. This result could beeasily converted in a immediate reduction of the economical losses in apple production due to undetected post-harvestinfections.

Since early years,

Multi-Armed Bandit (MAB) [21] algorithms have been largely applied and proved to be effective inthe task of optimizing sequential decisions. They achieve an optimal exploration-exploitation trade-off by minimizing

Manuscript submitted to ACM ounterfactual CMAB Application on Apple Disease Diagnosis 3the cumulative regret over a finite time horizon [3]. Along with theoretical analysis of upper bound regret, some workstested the effectiveness of MAB on synthetic and real-world classification tasks [9, 16, 23].

Thompson Sampling (TS) policy for Beta-Bernoulli bandit, extended with regularized logistic regression, was evaluated in the applications ofonline advertising and news recommendation by Chapelle and Li [6]. Moreover,

Contextual Multi-Armed Bandit (CMAB) was successfully applied throughout the years. This methodology provides a more realistic and adaptive setup in whicheach instance is processed with its side information, referred to as context. The learning process for identifying the bestarm is a function of the context at each iteration [10]. A well-known example of a real-world application of CMAB isrepresented by Li et al. [12]. The authors proposed an improved version of the

Upper Confidence Bound (UCB) algorithmwith disjoint and hybrid linear models. Furthermore, an effective offline evaluation of the bandit policies was performedon the click-through-rate of the news shown on the

Yahoo! homepage. Agrawal and Goyal [2] illustrated a compactimplementation of TS for the Gaussian bandit with contextual information, together with a proof of bounded regret.These works systematically evaluate the CMAB algorithms proposed so far and an additional novel one for documentclassification [1].Recently, the so-called “causal revolution” [17] invested the field of reinforcement learning and multi armed bandit.Initially, the complex causal relationships in the application scenarios of online computational advertising wereinvestigated by Bottou et al. [5]. In their seminal work on causal MAB, Bareinboim et al. [4] presented a causal approachto the Beta-Bernoulli TS algorithm. In particular, they illustrate a novel algorithm able to de-bias the bandit learningprocess from unobserved confounders, influencing both the arm selection and the observed outcome (i.e. the reward).The method proved to outperform the non-causal counterparts in terms of measured regret on synthetic examples. Amore extensive description of the theory behind the MAB with confounders factors is illustrated in [8] and in [25].This work exploits transfer learning in order to combine observational, experimental, and counterfactual data throughstructural causal models. In their extended work, Lee and Bareinboim [11] enhance the concept of causal multi-armedbandit by fusing them with structural causal graphs. MAB outcome is influenced by both, unobserved confounders andinterventional action by the agent. The POMIS algorithm (partial order of minimal intervention set) has been proposedto find the best arm solution. Again, experiments are conducted based on a simulation of pre-defined tasks. Morerecently, an application of non-contextual causal MAB on simulated e-mail data was presented by Lu et al. [13]. Theauthors of [7], instead, proposed a CMAB algorithm balanced with causal inference and they tested it on a benchmarkof 300 multi-class classification datasets.

For the task of diagnosing post-harvest disease of apple from its macroscopic symptoms, the employment of fully-automated machine learning techniques for image recognition appears to be insufficient up to now. The intra-diseasevariance is very high: the same pathogen induces different symptoms on different species, also based on the progressionof the diseases (i.e. days after an infection). At the same time for a non-expert evaluation - and even for expertswithout a microscopic or microbiological analysis - it is really difficult to understand the subtle differences of symptomappearances just by observing images of external symptoms, particularly at early stages of infection. In Figure 1bwe show three instances of external symptoms, that clearly highlight the difficulty of the classification task. The twosymptoms that look most similar - given also that they appear on the same apple cultivar - are in fact manifestations oftwo different diseases (

Neofabraea and

Alternaria ). On the other hand, two examples of

Alternaria symptoms appear tobe largely different, since they manifest themselves on different cultivars and at different stages of the infection.

Manuscript submitted to ACM

Sottocornola et al. (a) (b)Fig. 1. (a) The DSSApple interface. (b) An example of how difficult could be to identify the correct disease by its macroscopic symptoms.The left-most apple is infected by Neofabraea, while the others are infected by Alternaria.

In order to tackle this challenging problem, we developed a decision support system, named

DSSApple , designed tobe an easy-to-use web application that helps expert (e.g. storage workers, researchers) and non-expert users to performin-field diagnosis of apple diseases from macroscopic symptoms (i.e. without any microscopic investigation). Giventhe discussed peculiarity of the domain, we choose to design the application as a human-in-the-loop interactive DSS.The design of this prototype was mainly inspired by the work by Pertot et al. [18]. The interaction of the user withthe system is conducted simply by clicking on pictures (referred to as feedback images ), representing the symptoms’variety of different diseases at different stages of infection. We conceptualized an interactive session with the systemas a sequence of rounds. At each round of interaction the user provides immediate feedback on a small set of images,depicting disease symptoms, based on the perceived similarity with an actually diseased target apple. We proved thatthis modality increases the system usability and alleviates the cognitive overload of the user [15]. Feedback images havebeen produced by taking pictures of sampled apples from different storage houses, where the ground truth (i.e. the actualdisease) has been determined by domain experts in laboratory using gene sequencing of spores. At the current stage,DSSApple includes 5 different pathogens, namely,

Alternaria , Botrytis , Mucor , Neofabraea , and

Penicillium , representedby a pool of 30 feedback images each. The limited total number of labelled high-quality images is another crucialfactor that discourage the usage of machine learning techniques for image classification. Figure 1a depicts a roundof interaction, where users can provide feedback on any number of small-scale feedback images, before submittingtheir choices to the system. At the end of each round, the system automatically reloads alternative images based on thefeedback provided by users. Different reloading policies have been tested to better adapt the new feedback images toprevious user feedback and, hence, increase the diagnostic accuracy of the system [20]. After a fixed number of rounds,DSSApple stops feedback collection and suggests a set of candidate diseases that are ranked based on the number ofcoherent user feedback on the symptom images belonging to each disease. Finally, the user can communicate her finalchoice to the system, by selecting one among the suggested diseases as the final diagnosis for the target infected apple.In this paper, we focus on the component of DSSApple responsible for the diagnosis - i.e. the classification of userfeedback into one or more suggested diseases. In particular, we treat each user session with the system as an event, andwe aim at sequentially optimizing each event for the best diagnosis. For each event, we exploit all the user feedbackcollected by the system in that session, as well as the past user interactions, in order to provide more reliable diagnosis

Manuscript submitted to ACM ounterfactual CMAB Application on Apple Disease Diagnosis 5and, ultimately, improve the accuracy of DSSApple. Thus, to achieve this goal we propose a novel counterfactualcontextual multi-armed bandit algorithm, adapted to the problem of diagnosing post-harvest disease of apple.

We combine disjoint linear CMAB for classification [12], with specific reference to the Thompson Sampling (TS) policy[2], to the powerful concept of counterfactual [17]. In particular, counterfactual modeling and reasoning allow us toeffectively leverage on human decisions (or intuitions) when a given arm has to be selected. (a) (b)Fig. 2. Causal model for the standard decision making scenario (a) and the counterfactual decision making (b).

Consider the causal diagram in Figure 2a depicting the decision making process in a MAB scenario, where node X represents the contextual information, i.e. the information explicitly available to the decision maker. In contrast,node U represents the unobserved confounders, i.e. the hidden information which influences the final decision. Node Y represents the final decision or arm selection (e.g. the diagnosis), while node O represents the outcome of thatdecision (e.g. whether the diagnosis Y was correct or not). We are aware of the fact that in a more general scenario thedecision outcome O is directly influenced by contextual and unobserved factors as well. We decided to stick with thissimplified causal model given that it better mimics our application scenario (i.e. in a diagnostic scenario, the correctnessof the diagnosis O is conditionally independent on both the observed variable X and unobserved variable U given thediagnosis Y ). A CMAB model trained on this decision problem learns only from the observed contextual information X . This is a strong limitation because it excludes relevant information that would be necessary in order to make aneffective decision. On the other hand, humans make decisions based on a mixed interpretation of the observed contextand the unobserved confounders (e.g. the diagnosis of a medical doctor is based on both structured knowledge onthe observed evidence and the intuition over past experience) [8]. Nevertheless, confounders could hardly be directlyincorporated into the decision model, due to their nature. Therefore, we propose to exploit the human decision as aproxy to implicitly access the unavailable information included in the unobserved confounders. In this situation, thedecision making scenario is represented by the causal graph in Figure 2b. The intuition of the user (observed node I ) isconditionally dependent on both contextual information and confounders factors. The final decision (arm selection) isthus conditioned on node I and X , including all the available information in the decision process.The complete procedure of the Counterfactual Thompson Sampling (CF-TS) is illustrated in Algorithm 1. The structureof the algorithm is similar to the one of a Gaussian CMAB with linear payoff and 𝐴 arms, where 𝐴 correspondsto the number of possible actions (or interventions). In this case, following the Thompson Sampling policy, theexploration-exploitation trade-off is provided by sampling from a multi-variate Gaussian 𝑁 ( ˆ 𝜇, 𝐵 − ) with 𝑑 components.The parameters update (last 3 lines) is obtained through the Bayesian update described in [2]. The novelty of theproposed algorithm stays in its capability of computing the expected reward (and thus the final decision) as a function Manuscript submitted to ACM

Sottocornola et al.

Algorithm 1:

Counterfactual Thompson Sampling (CF-TS) for Contextual Bandit DiagnosisInput: 𝐵 𝑖,𝑗 = 𝐼 𝑑 , ˆ 𝜇 𝑖,𝑗 = 𝑑 , 𝑓 𝑖,𝑗 = 𝑑 ; foreach t = 1, 2, . . . T do Receive a context 𝑥 ( 𝑡 ) 𝑑 and an arm intuition 𝑖 ( 𝑡 ) ; foreach y = 1, 2, . . . A do Sample ˜ 𝜇 𝑖 ( 𝑡 ) ,𝑦 ( 𝑡 ) from 𝑁 ( ˆ 𝜇 𝑖 ( 𝑡 ) ,𝑦 , 𝐵 − 𝑖 ( 𝑡 ) ,𝑦 ) ;Compute 𝐸 [ 𝑟 𝑖 ( 𝑡 ) ,𝑦 ( 𝑡 )] = 𝑥 ( 𝑡 ) ˜ 𝜇 𝑖 ( 𝑡 ) ,𝑦 ( 𝑡 ) ; end Play arm 𝑎 ( 𝑡 ) : = arg max 𝑦 𝐸 [ 𝑟 𝑖 ( 𝑡 ) ,𝑦 ( 𝑡 )] and observe reward 𝑟 ( 𝑡 ) ; 𝐵 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) = 𝐵 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) + 𝑥 ( 𝑡 ) 𝑥 ( 𝑡 ) ; 𝑓 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) = 𝑓 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) + 𝑥 ( 𝑡 ) 𝑟 ( 𝑡 ) ;ˆ 𝜇 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) = 𝐵 − 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) 𝑓 𝑖 ( 𝑡 ) ,𝑎 ( 𝑡 ) ; end of both the contextual information and the decision intuitively made by the user. This concept follows the causalrelationships depicted in Figure 2b, where 𝑌 = 𝑓 ( 𝑋, 𝐼 ) . In more details, at time 𝑡 , when a decision needs to be taken,the algorithm receives the context vector 𝑥 ( 𝑡 ) , as well as the arm intuition provided by the human decision maker 𝑖 ( 𝑡 ) . Following the notation in [4], the reward computation 𝐸 [ 𝑟 𝑖 ( 𝑡 ) ,𝑦 ( 𝑡 )] = 𝑥 ( 𝑡 ) ˜ 𝜇 𝑖 ( 𝑡 ) ,𝑦 ( 𝑡 ) is equivalent to compute 𝐸 [ 𝑟 ( 𝑡 ) 𝑦 | 𝐼 = 𝑖 ( 𝑡 ) , 𝑋 = 𝑥 ( 𝑡 )] , the expected reward at time 𝑡 for selecting arm 𝑦 given the contextual information 𝑋 = 𝑥 ( 𝑡 ) and the intuition of the user towards arm 𝐼 = 𝑖 ( 𝑡 ) . In other words, this formula answers the counterfactual question:“What have been the expected reward had I pull arm 𝑦 given that I am about to pull arm 𝑖 ( 𝑡 ) ?”. Finally, the algorithmfinds the best arm 𝑎 ( 𝑡 ) = 𝑦 which maximizes the expected reward 𝐸 [ 𝑟 ( 𝑡 )] for each decision/arm 𝑦 . We designed a gamified user study, that we called

Bad Apple Challenge , in order to validate the effectiveness of DSSApple[20]. The user, at each session, receives a target infected apple, for which she needs to diagnose the unknown disease.The participant navigates the DSSApple application (as described in Section 3) in order to receive a ranked list ofsuggested diagnosis. At the end of the session, the user has to select the diagnosis she consider the correct one for thetarget apple. The user accumulates score points if the diagnosis is correct (i.e. the selected disease corresponds to theactual disease of the target apple). We administrated the Bad Apple Challenge in a controlled environment with 163non-expert participants (i.e. two cohorts of BSc Computer Science students).We exploit the data collected through the Bad Apple Challenge to evaluate the effectiveness of the CF-TS algorithm,applied to the task of sequentially diagnosing apple diseases. Each challenge session is treated as a classification instance,where the feedback images selected by the user represent the context of the diagnosis. The final selection by the user isused as the intuition towards the diagnosis. The ground truth disease contributes to the computation of the reward.Thus, the sequential disease classification is converted into a multi-armed bandit problem where each candidate diseaseis mapped to an independent arm. Given that the number of candidate diseases is 5, we deal with a 5-armed banditproblem. A reward of 1 is assigned if the selected arm corresponds to the true disease of the target apple, 0 otherwise.Given the set of feedback images clicked during a challenge we extract two types of context: (1)

Image-based Context(ImgCtx) : The context of the diagnosis is computed as the sum of the pixel-wise representation of every selectedfeedback image. In order to compress the feature space and reduce the noise we apply to each unfold image vector a

Manuscript submitted to ACM ounterfactual CMAB Application on Apple Disease Diagnosis 7PCA with 64 dimensions. (2)

Similarity-based Context (SimCtx) : The context of the diagnosis is computed as thesum of the similarity vectors of each selected feedback image. The similarity vector is derived from user interactions asthe relative frequency of an image being co-clicked with others during the same challenge. Again the feature space isreduced to the 64 principal components through PCA.The total number of challenge sessions performed in the user study is 515, which is too few to evaluate the convergenceof bandit algorithms. We apply a fair and effective technique to simulate more instances. We replicate each session 𝑟 times, but, in each replica, every selected feedback image has a dropout probability 𝑑 > 𝑛 times, each time generating a different enhanced dataset to be used in the comparativeoffline evaluation. In our experiments, we set 𝑟 = , , 𝑡 = , , 𝑑 = . 𝑛 = Counterfactual Thompson Sampling (CF-TS) : The proposed method described in Section 4.(2)

Observational (Obs) : The observed decision model, i.e. the actual selection of each user at the end of the session.(3)

Thompson Sampling (TS) : The standard implementation of a set of disjoint Thompson Sampling classifierswith linear payoff, as described in [2, 12].(4)

Extended Thompson Sampling (ExtTS) : The standard TS algorithm with an extended context. Namely, theuser intuition is encoded in a one-hot vector and appended at the context vector. This baseline guarantees thatthe TS algorithm processes the same amount of information as the CF-TS method.

In the following we summarize the results obtained in the offline evaluation . The results reported in this section areall derived as an average over the 100 replications of each experiment. The graphs in Figure 3 depict the cumulativereward of every method with respect to the two context types, namely ImgCtx (a) and

SimCtx (b). The time horizonrepresented in the graph is the largest one of 3000 observations. The behavior of the methods after a smaller numberof iterations (i.e. 𝑡 = , CF-TS method emerges as a clear winner, achieving the maximum reward among the competitors.Nevertheless, the algorithm needs a warm-up period that depends on the provided context. In the case of

ImgCtx , CF-TS needs around 1200 iterations to achieve results comparable to the ones obtained by human selection. In the other case,with a more meaningful context derived by collaborative image similarity, the algorithm is faster in outperformingthe observational model, around the 1000-th iteration. Important to notice how the two TS baselines are never able toget closer to the performance achieved by the observational model, irrespectively on the number of observations theycan process. Another interesting aspect is that the extended model, which exploits a larger contextual information,registers a small gain in the case in which the original context is sub-optimal (namely for

ImgCtx ); thus, an improvedrepresentation of the context can effectively boost the TS model. Nevertheless, the

ExtTS model remains mostly in linewith standard TS and, despite the augmented observational information, it is not able to capitalize on it and to achieve aperformance comparable to the proposed CF-TS model. The full code of the experimental evaluation of the implemented methodology is available at https://github.com/endlessinertia/causal-contextual-banditsManuscript submitted to ACM

Sottocornola et al. (a) (b)Fig. 3. Cumulative reward after 3000 interactions, considering the image-based context,

ImgCtx (a) and the similarity-based context,

SimCtx (b) for CMAB.

In Table 1 we summarize the accuracy results of each algorithm in diagnosing the correct disease as a function of thenumber of observations (i.e. 𝑡 = , , ImgCtx and

SimCtx ).The accuracy for an algorithm 𝑚 at time horizon 𝑡 is simply computed as 𝑎 𝑚 ( 𝑡 ) = / 𝑡 (cid:205) 𝑡𝑖 = 𝑟 𝑚 ( 𝑖 ) , where 𝑟 𝑚 ( 𝑖 ) is thereward obtained by the algorithm 𝑚 at the 𝑖 -th iteration. We also include the performance of a ZeroR baseline, as areference point. The

ZeroR classifier always assigns as diagnosis the most frequent true disease in the set. t = 1000 t = 2000 t = 3000ImgCtx SimCtx ImgCtx SimCtx ImgCtx SimCtxCF-TS

ExtTS TS Obs

ZeroR

Table 1. Accuracy at different time horizons 𝑡 = , , , considering ImgCtx and

SimCtx for CMAB and baseline selectionmodels. * indicates improvements on other methods in the same column being significant at p-value < 0.01 on paired samples t-test.

The results of Table 1 clearly highlight the different behaviours of the 4 selection models. The observational modeloutperforms its counterparts in the first two scenarios in which the total number of iterations is set to 𝑡 = .

8% and 2 .

3% of accuracy, with respect to the same time horizon. In the experiments with 𝑡 = , ImgCtx , ExtTS significantly improves the accuracy w.r.t. TS from 32% to 33%, by exploiting the enhanced context. The samedoes not hold true for the SimCtx , where the gain is not significant. This is again due to the fact that the

SimCtx provides a more reliable information than the

ImgCtx , which gets more benefit from the context enrichment. Finally,

Manuscript submitted to ACM ounterfactual CMAB Application on Apple Disease Diagnosis 9it is interesting to notice how, after 2000 iterations the two TS baselines reach their upper bound in the predictiveaccuracy (i.e. no improvement is registered in the case of 𝑡 = CF-TS is capable toconstantly improve its performance at the different cutoffs, reaching a maximum of 40% accuracy after 3000 iterationsfor the

SimCtx context ( +

3% on the observational decision model).

We introduced an interactive decision support system, called

DSSApple , for tackling the real-world problem of diagnosingpost-harvest diseases of apple. Specifically, in this paper we presented a novel Counterfactual CMAB algorithm, thatis responsible to improve the diagnostic performance of the model, by sequentially leveraging users’ interactionswith the system. The algorithm, called

Counterfactual Thompson Sampling (CF-TS) , exploits human decision (i.e. theintuition) and the contextual information (i.e. the evidence) to compute the optimal counterfactual choice. We evaluatedthe effectiveness of CF-TS by comparing it with the performance of the observed human decisions and two naturalThompson Sampling baselines. In the offline experiments, simulated from data collected in a large gamified user study,CF-TS was able to outperform all the baselines and achieve significant improvements in diagnosing the correct diseases.In future work, the application will be tested in a real-world environment by storage workers and quality controlmanagers, in order to assess its diagnostic capability in-field. A crucial future extension of the presented applicationconsists in incorporating a knowledge base that models domain expertise, to include richer information (e.g. inner/outersymptoms description, cultivar type, handling conditions, etc.) to increase the overall success rate and to be capable tobetter explain the final diagnosis and recommend suitable countermeasures.

REFERENCES [1] Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the Monster: A Fast and Simple Algorithmfor Contextual Bandits (Proceedings of Machine Learning Research, Vol. 32) , Eric P. Xing and Tony Jebara (Eds.). PMLR, Bejing, China, 1638–1646.http://proceedings.mlr.press/v32/agarwalb14.html[2] Shipra Agrawal and Navin Goyal. 2013. Thompson Sampling for Contextual Bandits with Linear Payoffs. In

Proceedings of the 30th InternationalConference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA) (ICML’13) . JMLR.org, III–1220–III–1228.[3] Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-Time Analysis of the Multiarmed Bandit Problem. 47, 2–3 (May 2002), 235–256.https://doi.org/10.1023/A:1013689704352[4] Elias Bareinboim, Andrew Forney, and Judea Pearl. 2015. Bandits with Unobserved Confounders: A Causal Approach. In

Proceedings of the 28thInternational Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS’15) . MIT Press, Cambridge, MA, USA,1342–1350.[5] Léon Bottou, Jonas Peters, Joaquin Quiñonero Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and EdSnelson. 2013. Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising.

J. Mach. Learn. Res.

14, 1 (Jan. 2013),3207–3260.[6] Olivier Chapelle and Lihong Li. 2011. An Empirical Evaluation of Thompson Sampling. In

Proceedings of the 24th International Conference on NeuralInformation Processing Systems (Granada, Spain) (NIPS’11) . Curran Associates Inc., Red Hook, NY, USA, 2249–2257.[7] Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, and Guido Imbens. 2019. Balanced Linear Contextual Bandits. In

The Thirty-Third AAAIConference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The NinthAAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019 . 3445–3453.https://doi.org/10.1609/aaai.v33i01.33013445[8] Andrew Forney, Judea Pearl, and Elias Bareinboim. 2017. Counterfactual Data-Fusion for Online Reinforcement Learners. In

Proceedings of the 34thInternational Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17) . JMLR.org, 1156–1164.[9] Sham M. Kakade, Shai Shalev-Shwartz, and Ambuj Tewari. 2008. Efficient Bandit Algorithms for Online Multiclass Prediction. In

Proceedings ofthe 25th International Conference on Machine Learning (Helsinki, Finland) (ICML ’08) . Association for Computing Machinery, New York, NY, USA,440–447. https://doi.org/10.1145/1390156.1390212[10] John Langford and Tong Zhang. 2007. The Epoch-Greedy Algorithm for Contextual Multi-Armed Bandits. In

Proceedings of the 20th InternationalConference on Neural Information Processing Systems (Vancouver, British Columbia, Canada) (NIPS’07) . Curran Associates Inc., Red Hook, NY, USA,817–824. Manuscript submitted to ACM [11] Sanghack Lee and Elias Bareinboim. 2018. Structural Causal Bandits: Where to Intervene?. In

Proceedings of the 32nd International Conference onNeural Information Processing Systems (Montreal, Canada) (NIPS’18) . Curran Associates Inc., Red Hook, NY, USA, 2573–2583.[12] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation.In

Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10) . Association for ComputingMachinery, New York, NY, USA, 661–670. https://doi.org/10.1145/1772690.1772758[13] Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, and William Yan. 2020. Regret Analysis of Bandit Problems with Causal Background Knowledge.In

Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2020, virtual online, August 3-6, 2020

Erwerbs-Obstbau

56 (2014), 25–34.[15] Maximilian Nocker, Gabriele Sottocornola, Markus Zanker, Sanja Baric, Greice Amaral Carneiro, and Fabio Stella. 2018. Picture-Based Navigationfor Diagnosing Post-Harvest Diseases of Apple. In

Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia,Canada) (RecSys ’18) . Association for Computing Machinery, New York, NY, USA, 506–507. https://doi.org/10.1145/3240323.3241616[16] Sandeep Pandey, Deepayan Chakrabarti, and Deepak Agarwal. 2007. Multi-Armed Bandit Problems with Dependent Arms. In

Proceedings of the 24thInternational Conference on Machine Learning (Corvalis, Oregon, USA) (ICML ’07) . Association for Computing Machinery, New York, NY, USA,721–728. https://doi.org/10.1145/1273496.1273587[17] Judea Pearl. 2000.

Causality: Models, Reasoning, and Inference . Cambridge University Press, USA.[18] Ilaria Pertot, Tsvi Kuflik, Igor Gordon, Stanley Freeman, and Yigal Elad. 2012. Identificator: A Web-Based Tool for Visual Plant Disease Identification,a Proof of Concept with a Case Study on Strawberry.

Comput. Electron. Agric.

Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20) . Associationfor Computing Machinery, New York, NY, USA, 83–87. https://doi.org/10.1145/3377325.3377531[21] Richard S. Sutton and Andrew G. Barto. 2018.

Reinforcement Learning: An Introduction (second ed.). The MIT Press. http://incompleteideas.net/book/the-book-2nd.html[22] Turner B Sutton, Herb S Aldwinckle, Art Agnello, and James F Walgenbach (Eds.). 2014.

Compendium of Apple and Pear Diseases and Pests (2 ed.).APS press.[23] Joannès Vermorel and Mehryar Mohri. 2005. Multi-Armed Bandit Algorithms and Empirical Evaluation. In

Proceedings of the 16th EuropeanConference on Machine Learning (Porto, Portugal) (ECML’05)

Proceedings of the 26th InternationalJoint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI’17)(IJCAI’17)