[PDF] Rational Value of Information Estimation for Measurement Selection

Abstract

Computing value of information (VOI) is a crucial task in various aspects of decision-making under uncertainty, such as in meta-reasoning for search; in selecting measurements to make, prior to choosing a course of action; and in managing the exploration vs. exploitation tradeoff. Since such applications typically require numerous VOI computations during a single run, it is essential that VOI be computed efficiently. We examine the issue of anytime estimation of VOI, as frequently it suffices to get a crude estimate of the VOI, thus saving considerable computational resources. As a case study, we examine VOI estimation in the measurement selection problem. Empirical evaluation of the proposed scheme in this domain shows that computational resources can indeed be significantly reduced, at little cost in expected rewards achieved in the overall decision problem.

Full PDF

UURPDM2010

Rational Value of Information Estimation for Measurement Selection

David TolpinComputer Science Dept., Ben-Gurion University, 84105 Beer-Sheva, IsraelSolomon Eyal ShimonyComputer Science Dept., Ben-Gurion University, 84105 Beer-Sheva, IsraelABSTRACT.

Computing value of information (VOI) is a crucial task in various aspects ofdecision-making under uncertainty, such as in meta-reasoning for search; in selecting measurementsto make, prior to choosing a course of action; and in managing the exploration vs. exploitationtradeoﬀ. Since such applications typically require numerous VOI computations during a single run,it is essential that VOI be computed eﬃciently. We examine the issue of anytime estimation of VOI,as frequently it suﬃces to get a crude estimate of the VOI, thus saving considerable computationalresources. As a case study, we examine VOI estimation in the measurement selection problem.Empirical evaluation of the proposed scheme in this domain shows that computational resourcescan indeed be signiﬁcantly reduced, at little cost in expected rewards achieved in the overall decisionproblem. INTRODUCTION

Problems of decision-making under uncertainty frequently contain cases where information can beobtained using some costly actions, called measurement actions. In order to act rationally in thedecision-theoretic sense, measurement plans are typically optimized based on some form of valueof information (VOI). Computing VOI can also be computationally intensive. Since frequently anexact VOI is not needed in order to proceed (e.g. it is suﬃcient to determine that the VOI of acertain measurement is much lower than that of another measurement, at a certain point in time),signiﬁcant computational resources can be saved by controlling the resources used for estimatingthe VOI. This paper examines this tradeoﬀ via a case study of measurement selection.In general, computation of value of information (VOI), even under the commonly usedsimplifying myopic assumption, involves multidimensional integration of a general function[Russell and Wefald, 1991]. For some problems, the integral can be computed eﬃciently[Russell and Wefald, 1989]; but when the utility function is computationally intensive or whena non-myopic estimate is used, the time required to compute the value of information can be sig-niﬁcant [Heckerman et al., 1993] [Bilgic and Getoor, 2007] and must be taken into account whilecomputing the net value of information. This paper presents and analyzes an extension of theknown greedy algorithm that decides when to recompute VOI of each of the measurements basedon the principles of limited rationality [Russell and Wefald, 1991].Although it may be possible to use this idea in more general settings, this papermainly examines on-line most informative measurement selection [Krause and Guestrin, 2007][Bilgic and Getoor, 2007], an approach which is commonly used to solve problems of optimiza-tion under uncertainty [Zheng et al., 2005] [Krause et al., 2008]. Since this approach assumes thatthe computation time required to select the most informative measurement is negligible comparedto the measurement time[Russell and Wefald, 1991], it is important in this setting to ascertain thatVOI estimation indeed does not consume excessive computational resources.1 a r X i v : . [ c s . A I] A p r RPDM2010 THE MEASUREMENT SELECTION PROBLEM

As our case study, we examine the following optimization problem. Given: • A set of N s items S = { s , s , . . . , s N s } . • A set of N f item features Z = { z , z , . . . , z N f } . (Each feature z i has a domain D ( z i ).) • A joint distribution over the features of the items in S . That is, a joint distribution over therandom variables { z ( s ) , z ( z ) , . . . , z ( s ) , z ( s ) , . . . } . • A set of measurement types M = { ( c, p ) k k ∈ ..N m } , with potentially diﬀerent intrinsicmeasurement cost c and observation distribution p , conditional on the true feature values, foreach measurement type. • A utility function u ( z ) : R N f → R on features. In the simplest case, there is just one real-valued feature, acting as the item’s utility value, and u is simply the identity function. • A measurement budget C .Find a policy of measurement decisions and a ﬁnal selection that maximize the expected net utilityof the selection (the expected reward):max: R = u ( z ( s α )) − N q (cid:88) i =1 c k i s.t.: N q (cid:88) i =1 c k i ≤ C (1)where Q = { ( k i , s i ) i ∈ ..N q } is the performed measurement sequence and s α is the selecteditem. A next measurement is selected on-line, after the outcomes of all preceding measurementsare known.The above selection problem is intractable, and is therefore commonly solved approximately usinga greedy heuristic algorithm. The greedy algorithm selects a measurement m j max with the greatestnet value of information V j max . The net value of information is the diﬀerence between the intrinsicvalue of information and the measurement cost. V j = Λ j − c k j (2)The intrinsic value of information Λ j is the expected diﬀerence in the true utility of the ﬁnallyselected item s α after and before the measurement.Λ j = E ( E [ u ( z ( z α j ))] − E [ u ( z ( s α ))]) (3)Exact computation of Λ j is intractable, and various estimates are used, including the myopicestimate [Russell and Wefald, 1991] and semi-myopic schemes [Tolpin and Shimony, 2010].The pseudocode for the algorithm is presented as Algorithm 1. At each step, the algorithm re-computes the value of information estimate of every measurement. The assumptions behind thegreedy algorithm are justiﬁed when the cost of selecting a next measurement is negligible comparedto the measurement cost. However, optimization problems with hundreds and thousands of itemsare common [Tolpin and Shimony, 2010]; and even if the value of information of a single measure-ment can be computed eﬃciently [Russell and Wefald, 1989], the cost of estimating the value ofinformation of all measurements becomes comparable to and outgrows the cost of performing ameasurement.Recomputation of the value of information for every measurement is often unnecessary, especiallywhen using the ”blinkered” scheme [Tolpin and Shimony, 2010], a greedy algorithm which attemptsto also compute VOI for sequences of measurements of the same type. When there are many diﬀerent2RPDM2010 Algorithm 1

Greedy measurement selection budget ← C Initialize beliefs loop for all items s i do Compute E ( U i ) for all measurements m j do if c j ≤ budget then Compute V j else V j ← j max ← arg max j V j if V j max > then Perform measurement m j max ; Update beliefs; budget ← budget − c j max else break α ← arg max j E ( U i ) return s α measurements, the value of information of most measurements is unlikely to change abruptly dueto just one other measurement results. With an appropriate uncertainty model, it can be shownthat the VOI of only a few of the measurements must be recomputed after each measurement, thusdecreasing the computation time and ensuring that the greedy algorithm exhibits a more rationalbehavior w.r.t. computational resources. RATIONAL COMPUTATION OF VALUE OF INFORMATION

For the selective VOI recomputation, the belief BEL(Λ j ) about the intrinsic value of informationof measurement m j is modeled by a normal distribution with variance ς j :BEL(Λ j ) = N (Λ j , ς j ) (4)After a measurement is performed, and the beliefs about the item features are updated (line 13 ofAlgorithm 1), the belief about Λ j becomes less certain. Under the assumption that the inﬂuence ofeach measurement on the value of information of other measurements is independent of inﬂuenceof any other measurement, the uncertainty is expressed by adding Gaussian noise with variance τ to the belief: ς j ← ς j + τ (5)When Λ j of measurement m j is computed, BEL(Λ j ) becomes exact ( ς j ← W k is eﬃciently computable, and thesubset of measurements for which the value of information is computed in line 15 of Algorithm 2is controlled by the computation cost c V : W k = ς k √ π e (cid:18) − ( Vγ − Vk ) ς k (cid:19) − | V γ − V k | Φ (cid:18) − | V γ − V k | ς k (cid:19) − c V (6)3RPDM2010 Algorithm 2

Rational computation of the value of information for all measurements m j do if c j ≤ budget then V j ← Λ j − c j ; ς j ← (cid:112) ς j + τ else V j ← ς j ← loop for all measurements m k do if c k ≤ budget then Compute W k else W k ← k max ← arg max k W k if W k max ≤ then break Compute Λ k max ; V k max ← Λ k max − c k max ; ς k max ← j max ← arg max j V j Compute Λ j max ; V j max ← Λ j max − c j max ; ς j max ← V γ is the highest value of information V α if any but the highest value of information isrecomputed, and the next to highest value of information V β if the highest value of information isrecomputed; Φ( x ) is the Gaussian cumulative probability of x for µ = 0 , σ = 1. Uncertainty variance τ can be learned as a function of the total cost of performed measurements,either oﬀ-line from earlier runs on the same class of problems, or on-line. Learning τ ( c ) on-linefrom earlier VOI recomputations proved to be robust and easy to implement: τ is initialized to 0and gradually updated with each recomputation of the value of information. EMPIRICAL EVALUATION

Experiments in this section compare performance of the algorithm that recomputes the value ofinformation selectively with the original algorithm in which the value of information of every mea-surement is recomputed at every step. Two of the problems evaluated in [Tolpin and Shimony, 2010]are considered: noisy Ackley function maximization and

SVM parameter search . For each of theoptimization problems, plots of the number of VOI recomputations, the reward, the intrinsic utility,and the total cost of measurements are presented. The results are averaged for multiple (100) runsof each experiment, such that the standard deviation of the reward is ≈

5% of the mean reward.In the plots, the solid line corresponds to the rationally recomputing algorithm, the dashed linecorresponds to the original algorithm, and the dotted line corresponds to the algorithm that se-lects measurements randomly and performs the same number of measurements as the rationallyrecomputing algorithm for the given computation cost c V . Since, as can be derived from (6), thecomputation time T r of the rationally recomputing algorithm decreases with the logarithm of thecomputation cost c V , T r = Θ( A − B log c V ), the computation cost axis is scaled logarithmically.4RPDM2010 The Ackley function [Ackley, 1987] is a popular optimization benchmark. The two-argument formof the Ackley function is used in the experiment; the function is deﬁned by the expression (7): A ( x, y ) = 20 · exp (cid:32) − . (cid:114) x + y (cid:33) + exp (cid:18) cos (2 πx ) + cos (2 πy )2 (cid:19) (7)In the optimization problem, the utility function is u ( z ) = tanh(2 z ), the measurements are normallydistributed around the true values with variance σ m = 0 .

5, and the measurement cost is 0 .

01. Thereare uniform dependencies with σ w = 0 . . VOI re−computations c V N . . . Reward c V R . . Utility c V U . . . Cost of measurements c V S Figure 1: The Ackley function, blinkered scheme.Figure 1.

An SVM (Support Vector Machine) classiﬁer based on the radial basis function has two param-eters: C and γ . A combination of C and γ with high expected classiﬁcation accuracy should bechosen, and an eﬃcient algorithm for determining the optimal values is not known. A trial for acombination of parameters determines estimated accuracy of the classiﬁer through cross-validation.The svmguide2 [wei Hsu et al., 2003] dataset is used for the case study. The utility function is u ( z ) = tanh(4( z − . C and log γ axes are scaled for uniformity to ranges [1 ..

21] andthere are uniform dependencies along both axes with σ w = 0 .

4. The measurements are normallydistributed with variance σ m = 0 .

25 around the true values, and the measurement cost is c m = 0 . In all experiments, a signiﬁcant decrease in the computation time is achieved with only slightdegradation of the reward; performance of the rationally recomputing algorithm decreases slowlywith the computation cost and exceeds performance of the algorithm that makes random mea-surements even when VOI for only a small fraction of measurements is recomputed at each step.5RPDM2010

VOI re−computations c V N . . . Reward c V R . . Utility c V U . . Cost of measurements c V S Figure 2: SVM parameter search, myopic scheme.Exact dependency of performance of the rationally recomputing of algorithm on the intensity ofVOI recomputations varies among problems and depends both on the problem properties and onthe VOI estimate used in the algorithm. CONCLUSION

The paper proposes an improvement to a widely used class of VOI-based optimization algorithms.The improvement allows to decrease the computation time while only slightly aﬀecting the perfor-mance. The proposed algorithm rationally reuses computations of VOI and recomputes VOI onlyfor measurements for which a change in VOI is likely to aﬀect the choice of the next measurement. ACKNOWLEDGEMENTS

The research is partially supported by the IMG4 Consortium under the MAGNET program of theIsraeli Ministry of Trade and Industry, by Israel Science Foundation grant 305/09, by the Lynnand William Frankel Center for Computer Sciences, and by the Paul Ivanier Center for RoboticsResearch and Production Management.

References [Ackley, 1987] Ackley, D. H. (1987).

A connectionist machine for genetic hillclimbing . KluwerAcademic Publishers, Norwell, MA, USA.[Bilgic and Getoor, 2007] Bilgic, M. and Getoor, L. (2007). Voila: Eﬃcient feature-value acquisi-tion for classiﬁcation. In

AAAI , pages 1225–1230. AAAI Press.[Heckerman et al., 1993] Heckerman, D., Horvitz, E., and Middleton, B. (1993). An approximatenonmyopic computation for value of information.

IEEE Trans. Pattern Anal. Mach. Intell. ,15(3):292–298. 6RPDM2010[Krause and Guestrin, 2007] Krause, A. and Guestrin, C. (2007). Near-optimal observation selec-tion using submodular functions. In

AAAI , pages 1650–1654.[Krause et al., 2008] Krause, A., Leskovec, J., Guestrin, C., VanBriesen, J., and Faloutsos, C.(2008). Eﬃcient sensor placement optimization for securing large water distribution networks.

Journal of Water Resources Planning and Management , 134(6):516–526. (Draft; full versionavailable here).[Russell and Wefald, 1989] Russell, S. J. and Wefald, E. (1989). On optimal game-tree search usingrational meta-reasoning. In

IJCAI , pages 334–340.[Russell and Wefald, 1991] Russell, S. J. and Wefald, E. (1991).

Do the right thing: studies inlimited rationality . MIT Press, Cambridge, MA, USA.[Tolpin and Shimony, 2010] Tolpin, D. and Shimony, S. E. (2010). Semi-myopic measurement se-lection for optimization under uncertainty. Technical Report 10-01, Lynne and William FrankelCenter for Computer Science at Ben Gurion University of the Negev, Israel.[wei Hsu et al., 2003] wei Hsu, C., chung Chang, C., and jen Lin, C. (2003). A practical guide tosupport vector classiﬁcation. Technical report.[Zheng et al., 2005] Zheng, A. X., Rish, I., and Beygelzimer, A. (2005). Eﬃcient test selection inactive diagnosis via entropy approximation. In