Optimal Learning

Warren B. Powell and Ilya O. Ryzhov

John Wiley and Sons, 2012

Optimal learning addresses the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive. Some sample applications include:

Each of these problems require making observations (measurements) to determine which choice works the best. The challenge is that measurements take time and/or cost money, which means we have to collect this information carefully.

Optimal Learning is a rich field that includes contributions from different communities. At the moment, this website focuses on our work on the knowledge gradient, a simple, elegant concept for collecting information. As the website evolves, we will provide a more complete representation of the different frameworks and methods that have evolved for solving this important problem class.

Clicking on the book cover takes you to Amazon. Course instructors may order an examination copy directly from Wiley. Click here. A review of the book by Steve Chick appeared in the November 2012 issue of Informs Journal on Computing. Click here. A short article on optimal learning that appeared in OR/MS Today is available here.

The knowledge gradient

Downloadable software

Applications

Table of contents of Optimal Learning

The Optimal Learning course at Princeton

Additional readings

 

The knowledge gradient

Below we provide an overview of our current research in the knowledge gradient, organized as follows:

 

A brief introduction

Our research has focused on the idea of the knowledge gradient, which measures the marginal value of a measurement in terms of the value of the information gained by the measurement. The measurement may require field experimentation or running a time consuming simulation (some business simulators take days to run). We model the economic decision we are trying to make, and then identify the information that has the highest impact on the economic problem.

The knowledge gradient does not identify the best choice - it identifies the measurement which will do the most to identify the best choice. If we have five alternatives (as shown to the right) with different levels of uncertainty about each alternative, we want to evaluate the alternative that offers the greatest chance of improving the final solution. So alternative 2 may be much more attractive to evaluate than alternatives 3 and 4.

The power of the knowledge gradient is the ease with which it can be applied to a wide range of settings. We use a Bayesian model that captures expert belief, making it possible to provide meaningful guidance right from the beginning. We have found that most applications exhibit correlated beliefs, which is to say that trying one alternative can teach us something about other alternatives. This makes it possible to provide meaningful guidance to find the best out of 10,000 molecular compounds after just 100 experiments.

We are developing methods to handle problems where the number of potential alternatives might number in the tens of thousands (of molecules), hundreds of thousands (of features for a car or computer) or infinite (setting the continuous parameters to optimize a device). Applying the knowledge gradient to find the best of five or ten alternatives with independent beliefs can be done in a spreadsheet. For larger problems, we need specialized algorithms. For example, if we are trying to find the hot spot (in red) of the surface to the left (below), we have to find the maximum of the knowledge gradient surface shown on the right.

The knowledge gradient for different belief models

To formulate an optimal learning problem, we have to first create a belief model. It is useful to divide these models into three fundamental classes:

Brief discussions of each are given below.

Lookup table belief models

Let an alternative x be a discrete number 1, ..., M where M is not too large (say less than 1000). We may have a belief mu_x about each x. If we have independent beliefs, the knowledge gradient is particularly easy to apply. An easy tutorial is contained in the article

Powell, W. B. "The Knowledge Gradient for Optimal Learning," Encyclopedia for Operations Research and Management Science, 2011 (c) John Wiley and Sons.

If you are interested in the real theory, see

Frazier, P., W. B. Powell and S. Dayanik, “A Knowledge Gradient Policy for Sequential Information Collection,” SIAM J. on Control and Optimization, Vol. 47, No. 5, pp. 2410-2439 (2008).

In most applications, our belief about mu_x may be correlated with our belief about another alternative, x'. We consider this one of the most powerful advantages of the knowledge gradient over other methods, including the classical bandit theory. This idea is described in the tutorial above, but the original paper on this topic is

P. Frazier, W. B. Powell, S. Dayanik, “The Knowledge-Gradient Policy for Correlated Normal Beliefs,” Informs Journal on Computing, Vol. 21, No. 4, pp. 585-598 (2009) (c) Informs (Click here for online supplement)

 

Parametric belief models

There are many problems where there may be a huge number of alternatives. For example, imagine we are trying to determine the best ad to put on a website. We would like to predict how many ad clicks an ad will receive based on attributes of the ad (the topic, number of words, graphics, ...). We may pose a regression model (let's assume a linear regression), but we do not know the values of the regression parameters. The goal is to try different ads to learn these parameters as quickly as possible. Once we know the parameters, we can estimate the value of thousands of different ads to determine the ones that are best to put on the website.

The knowledge gradient can be adopted to the problem of making choices to learn a regression model. This work was first done in the context of finding the best molecular compound to cure cancer (see Drug Discovery). The work is described in

D. Negoescu, P. Frazier and W. B. Powell, “The Knowledge Gradient Algorithm for Sequencing Experiments in Drug Discovery”, Informs Journal on Computing, Vol. 23, No. 3, pp. 346-363, 2011. (Click here for online supplement)

 

Nonparametric belief models

We have extended the knowledge gradient to two classes of nonparametric belief models. Our first effort used an approximation method based on estimating a function at different levels of aggregation. If we want an estimate of the function at an arbitrary query point x, we compute a set of weights w^g_x for each level of aggregation g for each query point x based on the total sum of squares error (variance plus bias).

We can use this belief model to estimate a function that we are trying to maximize. The knowledge gradient has to compute the expected value of an observation, taking into account the possible change in the estimate of the function at each level of aggregation, as well as the possible change in the weights w^g_x which have to be recomputed after each observation. This work is summarized in

Mes, M., P. I. Frazier and W. B. Powell, “Hierarchical Knowledge Gradient for Sequential Sampling,” J. Machine Learning Research, Vol.12, pp. 2931-2974, 2011.

We also computed the knowledge gradient when we are using kernel regression to estimate a function. This work is based on the paper above (Mes et al.), and is summarized in

E. Barut, W. B. Powell, “Optimal Learning for Sequential Sampling with Non-Parametric Belief Models,” J. Global Optimization (to appear)

We recently derived the knowledge gradient when using a local parametric approximation called DC-RBF (Dirichlet Clouds with Radial Basis Functions):

B. Cheng, A. Jamshidi, W. B. Powell, The Knowledge Gradient using Locally Parametric Approximations, Winter Simulation Conference, 2013.

 

The knowledge gradient for online and offline learning

Offline learning arises when we have a budget for finding the best possible solution, after which have to use the solution in a production setting. Online learning arises when we are in a production setting, and we have to live with the costs or rewards, but we want to learn as we go. In this setting, we have to make a tradeoff between the costs or rewards we receive, and the value of information that we acquire that we can use for future decisions.

It turns out that there is a very simple, elegant relationship between the knowledge gradient for offline learning, and the knowledge gradient for online learning. Imagine that we have a finite-horizon online learning problem where we have a total of N measurements, and we have already learned n. If v^{off}_x is the offline knowledge gradient for alternative x, then the online knowledge gradient is given by

v^{online}_x = \theta^n_x + (N-n) v^{offline}_x

where \theta^n_x is our current estimate of the value of alternative x after n measurements. This makes it possible to compute the knowledge gradient for problems with correlated beliefs.

For more, see

I. Ryzhov, W. B. Powell, P. I. Frazier, “The knowledge gradient algorithm for a general class of online learning problems,” Operations Research, Vol. 60, No. 1, pp. 180-195 (2012). 

 

Learning with continuous alternatives

A common problem arises when we have to tune a set of continuous set of parameters. This often arises when we have to find the set of parameters that will produce the best results for a model. You may want to minimize costs, minimize delays or find the best match between a model and historical metrics.

W. Scott, P. Frazier, W. B. Powell – “The Correlated Knowledge Gradient for Maximizing Expensive Continuous Functions with Noisy Observations using Gaussian Process Regression,” SIAM J. on Optimization (to appear)

 

Learning on graphs and linear programs

graphImagine that you want to find the shortest path between two points, but you do not know the times on the links. You have a way of collecting information, but it is expensive, and you have a limited amount of time to learn the best path. Which links should you learn about to have the greatest impact on your ability to find the shortest path?

The knowledge gradient can be computed for each link in the network using at most two shortest path calculations (and often one).

For more information, see

I. Ryzhov, W.B. Powell, "Information collection on a graph," Operations Research, Vol 59, No. 1, pp. 188-201, 2011. (c) Informs

For a more theoretical treatment of learning the coefficients of linear programs, see

Ryzhov,I., W. B. Powell, “Information Collection in a Linear Program,” SIAM J. Optimization (to appear)

 

Learning in the presence of a physical state

Ryzhov, I. O., W. B. Powell, “Approximate Dynamic Programming with Correlated Bayesian Beliefs,” Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 29 – October 1, 2010, Allerton Retreat Center, Monticello, Illinois., IEEE Press, pp. 1360-1367. This paper introduces the idea of using the knowledge gradient within a dyamic program, which effectively means in the presence of a physical state. This paper uses a discrete, lookup table representation of the belief model.

Ryzhov, I. O. and W. B. Powell, “Bayesian Active Learning With Basis Functions,” SSCI 2011 ADPRL - 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011. - This paper uses the knowledge gradient for dynamic programs where the value function is now approximated using a linear model.

 

Learning with a robust objective

A fresh perspective of learning is to introduce a mini-max objective. Instead of maximizing the expected value of a measurement, we can adapt the knowledge gradient to maximize the worst outcome. An initial investigation of this idea is

Ilya Ryzhov, Boris Defourny, Warren Powell, “Ranking and Selection Meets Robust Optimization,” Winter Simulation Conference, 2012.

 

Downloadable software

We offer the following modules for download:

The Optimal Learning Calculator

The Optimal Learning Calculator is a spreadsheet-based interface to a series of models and algorithms implemented in java. Through this interface, you can experiment with different learning problems defined for a discrete (and not too large) set of alternatives).

The calculator makes it possible for you to test your skills in manually finding the optimum of a function, or you can test a wide range of learning policies, including the knowledge gradient, Gittins indices, interval estimation, Boltzmann exploration and others.

Click here to download the spreadsheet and Java library.

We are implementing the knowledge gradient algorithm in a Java-based library that can be accessed by other software packages. We are also developing an Excel-based interface that makes it possible to experiment with learning problems, testing both the knowledge gradient (for different problem classes) but also a range of other learning policies (such as Gittins, interval estimation, Boltzmann exploration, and a number of others) for both online and offline learning problems. A sample of the interface looks like

This screen allows you to manually select measurements and watch your progress, including updates which capture correlated beliefs. A sample screen from such a run shows

In a separate screen, you can run simulations, comparing the knowledge gradient policy against other policies.

 

Applications

This sections highlights some applications we have encountered, partly from research, partly from teaching, and partly from our own need for optimal learning algorithms in the context of comparing and tuning algorithms.

 

Drug discovery

The knowledge gradient, using a parametric belief model, was used to sequence experiments while searching for the best compound to cure a form of Ewing's sarcoma. For more on this project, click here.

 

May the best man win

Finding the best team to compete in an invent.

Ryzhov, I. O., Awais Tariq, W. B. Powell, “May the Best Man Win: Simulation Optimization for Match-Making in E-Sports,” Proceedings of the Winter Simulation Conference, Phoenix, Arizona, December 11-14.

 

Finding the best energy saving technology

Ryzhov, I., W. B. Powell, “A Monte-Carlo Knowledge Gradient Method for Learning Abatement Potential of Emissions Reduction Technologies,” Winter Simulation Conference, 2009. M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds, 2009, pp. 1492-1502.

 

Student projects from ORF 418 - Optimal Learning

The student projects performed in the course taught at Princeton (ORF 418-Optimal Learning) produced a wide range of interesting topics. Below is a partial list:

Learning Optimal Levels for the Reservoir in Yunnan, China

Ethiopian Famines— Learning Solutions for Sustainable Agriculture

Finding Effective Strategies in a Multi-Strategy Hedge Fund

Waffles and Dinges and Knowledge Gradient, Oh My! Evaluating the Knowledge Gradient Algorithm with Linear Beliefs for the Street Cart Vendor Problem

Optimal Tuning of a Particle Swarm Algorithm

The Ultimate Set List – Using the knowledge gradient to find the best band set to maximize DVD sales after a band performance

Competing with Netflix: Recommending the Right Movie

Learning Optimal Tolls for the Lincoln Tunnel: Solving Port Authority Pricing and Optimal Driver Commute

Optimizing the Price of Apps on the iTunes Store

Ordering Products for Sale in a Small Business Setting: Learning Policies for Online Subset Selection in the Context of Complementary and Substitute Goods

Optimizing Polling Strategies for Election Campaigns

Learning Matching Strategies for Dating Sites

To Pick a Champion: Ranking and Selection by Measuring Pairwise Comparisons

The Inverse Protein Folding Problem: An Optimal Learning Approach

Learning When to Punt on 4th and Short

Selecting a Debate Team using Knowledge Gradient for Correlated Beliefs

 

The Optimal Learning course at Princeton University

ORF 418, Optimal Learning, is an undergraduate course taught in the department of Operations Research and Financial Engineering at Princeton University. Classes typically run between 30 and 40 students, all of whom would have taken a course in probability and statistics.

While the theory behind optimal learning is fairly deep and could only be taught at the graduate level, the modeling concepts and techniques of optimal learning can easily be taught at the undergraduate level to serious students.

Syllabus (2012) - Princeton enjoys 12 week semesters, so this syllabus may look a bit short to many faculty. However, it is easy to add lectures using material from the book. Note that the later chapters are more advanced.

Problem sets (2012) - This zipped file includes latex files and associated software (spreadsheets and matlab code). Most of my exercises are included in the book, but I continue to revise. I give weekly problem sets and a midterm, after which the students take on a course project.

Course project - Students are encouraged to work in teams of two. The project requires that they pick a problem where the collection of information is time-consuming or expensive. The project has three requirements: initial problem description, a summary of the math model and learning policies, and then the final report. You need to use care to make sure they pick good problems. I use the last three lectures (depending on the size of the class) to allow students to present their projects (without numerical results), so that the rest of the class sees the diversity of problems.


Additional readings

 

Below is a summary of research papers that we have produced while pursuing this work.

Tutorials

Powell, W.B."Optimal Learning: Optimization in the Information Age," article in OR/MS Today (2012)

This is a short, equation-free article introducing the basic concept of optimal learning, which appeared in the Informs news magazine, OR/MS Today.

(click here to download paper)

Powell, W. B. "The Knowledge Gradient for Optimal Learning," Encyclopedia for Operations Research and Management Science, 2011 (c) John Wiley and Sons.

This is a shorter but more up-to-date tutorial on optimal learning than the tutorial listed next. The presentation focuses more on the knowledge gradient.

(click here to download paper)

Powell, W. B. and P. Frazier, "Optimal Learning," TutORials in Operations Research, Chapter 10, pp. 213-246, Informs (2008). (c) Informs

 

The knowledge gradient with independent beliefs

Frazier, P., W. B. Powell and S. Dayanik, “A Knowledge Gradient Policy for Sequential Information Collection,” SIAM J. on Control and Optimization, Vol. 47, No. 5, pp. 2410-2439 (2008).

 

The knowledge gradient with correlated beliefs (offline learning, discrete alternatives)

P. Frazier, W. B. Powell, S. Dayanik, “The Knowledge-Gradient Policy for Correlated Normal Beliefs,” Informs Journal on Computing, Vol. 21, No. 4, pp. 585-598 (2009) (c) Informs

 

The S-curve effect - Handling the nonconcavity of information

Frazier, P. I., and W. B. Powell, “Paradoxes in Learning: The Marginal Value of Information and the Problem of Too Many Choices,” Decision Analysis, Vol. 7, No. 4, pp. 378-403, 2010.

The value of information can be a concave function in the number of measurements, but for many problems it is not, and instead follows an S-curve. A little bit of information may teach you nothing, and you may have to make an investment in information beyond a certain threshold to actually have an impact. We investigate the economic implications of the S-curve effect, showing that it is possible to have too many choices. We then revisit the knowledge gradient algorithm, which allocates measurements based on the marginal value of information. The knowledge gradient can produce poor learning results in the presence of an S-curve. We propose the KG(*) algorithm, which maximizes the average value of information, and show that it produces good results when there is a significant S-curve effect.

(click here to download paper)

 

From offline learning to online learning:

I. Ryzhov, W. B. Powell, P. I. Frazier, “The knowledge gradient algorithm for a general class of online learning problems,” Operations Research, Vol. 60, No. 1, pp. 180-195 (2012). 

The knowledge-gradient policy was originally derived for off-line learning problems such as ranking and selection. Considerable attention has been given to the on-line version of this problem, known popularly as the multiarmed bandit problem, for which Gittins indices are known to be optimal for discounted, infinite-horizon versions of the problem. In this paper, we derive a knowledge gradient policy for on-line problems, and show that it very closely matches the performance of Gittins indices for discounted infinite horizon problems. It actually slightly outperforms the best available approximation of Gittins indices (by Gans and Chick) on problems for which Gittins indices should be optimal. The KG policy is also effective on finite horizon problems. The only policy which is competitive with KG seems to be interval estimation, but this requires careful tuning of a parameter. The KG policy also works on problems where the beliefs about different alternatives are correlated. The KG policy with independent beliefs is extremely easy to compute (we provide closed-form expressions for the case with normal rewards), and requires a simple numerical algorithm for the case with correlated beliefs.

(click here to download paper)

 

The knowledge gradient using a linear belief model

D. Negoescu, P. Frazier and W. B. Powell, “The Knowledge Gradient Algorithm for Sequencing Experiments in Drug Discovery”, Informs Journal on Computing, Vol. 23, No. 3, pp. 346-363, 2011. (c) Informs

This paper describes a method for applying the knowledge gradient to a problem with a very large number of alternatives. Instead of creating a belief about each alternative (known as a "lookup table belief model"), we represent our belief about an alternative using linear regression (known as a "parametric belief model"). The method is motivated by the need to find the best molecular compound to solve a particular problem (e.g. killing cancer cells). There is a base compound with a series of sites (indexed by j) and a series of small sequences of atoms ("substituents") indexed by i. Let X_{ij} = 1 if we put substituent i at site j, and let theta_{ij} be the impact of this combination on the performance of the compound. The goal is to choose compounds to test that allow us to estimate the parameters theta as quickly as possible.

(click here to download main paper) (Click here for online supplement)

 

The knowledge gradient for nonparametric belief models:

Mes, M., P. I. Frazier and W. B. Powell, “Hierarchical Knowledge Gradient for Sequential Sampling,” J. Machine Learning Research, Vol. 12, pp. 2931-2974

We develop the knowledge gradient for optimizing a function when our belief is represented by constants computed at different levels of aggregation. Our estimate of the function at any point is given by a weighted sum of estimates at different levels of aggregation.

(Click here to download paper)

E. Barut and W. B. Powell, “Optimal Learning for Sequential Sampling with Non-Parametric Beliefs"

This paper develops and tests a knowledge gradient algorithm when the underlying belief model is nonparametric, using a broad class of kernel regression models. The model assumes that the set of potential alternatives to be evaluated is finite.

(Click here to download paper)

B. Cheng, A. Jamshidi, W. B. Powell, The Knowledge Gradient using Locally Parametric Approximations, Winter Simulation Conference, 2013.

The knowledge gradient is developed for a locally parametric belief model. This model, called DC-RBF, approximates a function by representing the domain using a series of clouds, which avoids storing the history.

(Click here to download paper)

Learning while learning

There are applications where the underlying alternative is steadily getting better in the process of observing it. An athlete improves over time, as do teams that work together over time. A product with a specific set of features might see sales steadily improve as word of mouth gets around. Often, we do not have time to wait for a process to reach its asymptotic limit, so we can fit a function that tries to guess (imperfectly) this limit. This problem arose in a business simulator which used approximate dynamic programming to learn a policy, while we were tuning various business parameters. This work is summarized in

P. Frazier, W. B. Powell, H. P. Simao, "Simulation Model Calibration with Correlated Knowledge-Gradients," Winter Simulation Conference, December, 2009.

A common challenge in the calibration of simulation model is that we have to tune several continuous parameters. This is our first application of the knowledge gradient algorithm with correlated beliefs to the problem of parameter tuning for simulation models. A single run of the model (which uses adaptive learning from approximate dynamic programming) requires more than a day, so the paper also introduces methods to product results without a full run. A Bayesian model is set up to capture the uncertainty in our beliefs about the convergence of the model. The method is illustrated in the tuning of two continuous parameters, which required approximately six runs of the model.

(click here to download paper)

Learning when the alternatives are continuous

This paper can handle low-dimensional vectors of continuous parameters.

Scott, Warren, P. I. Frazier, and W. B. Powell. "The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters Using Gaussian Process Regression." SIAM Journal on Optimization 21, No. 3 (2011): 996-1026.

We have previously developed the knowledge gradient with correlated beliefs for discrete alternatives. This paper extends this idea to problems with continuous alternatives. We do this by developing a continuous approximate of the knowledge gradient. This produces a nonconcave surface that we have to maximize. Local minima are located close to points that have been previously measured, so we use these points to guess at the locations of local maxima and then use a simple gradient search algorithm starting from each of these points. A proof of convergence is provided. We use the distances between local minima to perform scaling of the steepest descent algorithm. We compare the method against Huang's adaptation of sequential kriging to problems with noisy measurements.

(Click here to download paper)

 

Learning on graphs and linear programs

I.O. Ryzhov, W.B. Powell, "Information collection on a graph," Operations Research, Vol 59, No. 1, pp. 188-201, 2011. (c) Informs

We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to re ne Bayesian estimates of individual arc costs in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute, and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.

(click here to download paper)

Ryzhov,I.O., W. B. Powell, “Information Collection in a Linear Program,” SIAM J. Optimization (to appear)

Linear programs often have to be solved with estimates of costs. This theory paper describes an adaptation of the knowledge gradient for general linear programs, extending our previous paper on learning the costs on arcs of a graph for a shortest path problem.

(Click here to download paper)

B. Defourny, I. O. Ryzhov, W. B. Powell, “Optimal Information Blending with Measurements in the L2 Sphere"

This (primarily theoretical) paper extends the paper above on learning the coefficients of a linear program. We consider the situation where information is collected in the form of a linear combination of the objective coefficients, subject to random noise. We can choose the weights in the linear combination, a process we refer to as information blending. The paper presents two optimal blending strategies: an active learning method that maximizes uncertainty reduction, and an economic approach that maximizes an expected improvement criterion. Semidefinite programming relaxations are used to create efficient convex approximations to the nonconvex blending problem.

(Click here to download paper)

Learning in the presence of a physical state

Ryzhov, I. O., W. B. Powell, “Approximate Dynamic Programming with Correlated Bayesian Beliefs,” Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 29 – October 1, 2010, Allerton Retreat Center, Monticello, Illinois., IEEE Press, pp. 1360-1367. This paper introduces the idea of using the knowledge gradient within a dyamic program, which effectively means in the presence of a physical state. This paper uses a discrete, lookup table representation of the belief model.

Ryzhov, I. O. and W. B. Powell, “Bayesian Active Learning With Basis Functions,” SSCI 2011 ADPRL - 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011. - This paper uses the knowledge gradient for dynamic programs where the value function is now approximated using a linear model.

 

Comparing against nature

Zhang, S. & Yu, A.J. (2013). "Cheap but clever: human active learning in a bandit setting." In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society.

The paper studies the question of how humans would behave in an optimal learning problem. Empirical results suggest that they do not behave optimally, but that they do seem to behave similarly to the KG policy. The human subjects also make better decisions than various other standard bandit algorithms. A followup paper appeared in http://www.cogsci.ucsd.edu/~ajyu/Papers/nips13_bandit.pdf (NIPS, 2013, pp. 2607-2615).

Learning in games

In this section, I will be collecting learning papers that arise in the context of various games, where information collection can play a major role.

B. Jedynak, P. Frazier, R. Sznitman, "Twenty questions with noise: Bayes optimal policies for entropy loss," Applied Probability Trust, 2011.

We consider the problem of 20 questions with noisy answers, in which we seek to find a target by repeatedly choosing a set, asking an oracle whether the target lies in this set, and obtaining an answer corrupted by noise. Starting with a prior distribution on the target's location, we seek to minimize the expected entropy of the posterior distribution. We formulate this problem as a dynamic program and show that any policy optimizing the one-step expected reduction in entropy is also optimal over the full horizon. Two such Bayes-optimal policies are presented: one generalizes the probabilistic bisection policy due to Horstein and the other asks a deterministic set of questions. We study the structural properties of the latter, and illustrate its use in a computer vision application.

(click here to download paper)

Convergence theory

The paper below provides a strategy for proving asymptotic consistency proofs of sequential learning algorithms for offline learning algorithms.