1
Active Search and Bandit Methods for Complex Actions and Rewards Abstract: Speaker: Yifei Ma PhD Candidate Committee: Jeff Schneider (Chair) Aarti Singh Alex Smola Ryan P. Adams (Harvard University) Roman Garnett (Washington University, St. Louis) Thesis Proposal May 7, 2015 10:00am 8102 GHC Multi-arm bandit problems study how to maximize cumulative rewards by adaptively choosing arms and learning from their outcomes. Many algorithms have been developed with "no regret'' guarantees in the long run, ensuring that the average cumulative reward converges to the expected reward of one single best policy. Active search is a newer concept that hopes to discover all positive targets by adaptively examining candidate targets and capturing their outcomes. Reward for the same target is only counted once, thus no point is ever chosen more than once. Theoretical analyses of active search can be made similar to bandit methods, and this proposal will consider both of these problems. To date, most active search and bandit methods act on the premise that the action allowed is to sample a single candidate at a time and the reward is determined by the chosen candidate alone. Many real world applications, however, define action and reward in a more complex way. For example, in environmental monitoring carried out by autonomous boats with on-board sensors, a set of readings within a certain region that satisfy some pattern can indicate a real pollution problem, but any single point measurement will be insufficient evidence. This requires that reward be defined on regions. Similarly, many sensors are constructed to observe large areas simultaneously rather than single points. To operate these sensors, each action covers all the targets under the sensing region. Finally, marketing through social media may return properties of subgraphs, for example in social polling. In this work, we propose to develop multi-arm bandit and active search algorithms in these more complex action and reward settings. My previous research has covered some basic scenarios in this area: searching for regions with sufficiently high means by collecting data at point level, discovering complex global patterns defined on regions by adaptively observing individual points, and using a new criterion called Sigma-optimality to perform active learning and polling on social network graphs. I propose to extend the theoretical aspect of these and similar problems: to apply Sigma-optimality as the exploration term for active search of positives on network graphs, to study the properties of other spectral functions in terms of explor- ation on network graphs, to analyze cumulative regrets for Gaussian process--bandit algorithms when rewards are defined by region averages, and finally to develop efficient algorithms when the action allows pulling multiple arms or regions.

Thesis Proposal Poster-Ma - Carnegie Mellon School of Computer … · 2015. 4. 28. · Thesis Proposal May 7, 2015 10:00am 8102 GHC Multi-arm bandit problems study how to maximize

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Thesis Proposal Poster-Ma - Carnegie Mellon School of Computer … · 2015. 4. 28. · Thesis Proposal May 7, 2015 10:00am 8102 GHC Multi-arm bandit problems study how to maximize

Active Search and Bandit Methods for Complex Actions and Rewards Abstract:

Speaker:

Yifei MaPhD Candidate

Committee: Je� Schneider (Chair) Aarti Singh Alex Smola Ryan P. Adams (Harvard University) Roman Garnett (Washington University, St. Louis)

Thesis Proposal

May 7, 2015 10:00am8102 GHC

Multi-arm bandit problems study how to maximize cumulative rewards by adaptively choosing arms and learning from their outcomes. Many algorithms have been developed with "no regret'' guarantees in the long run, ensuring that the average cumulative reward converges to the expected reward of one single best policy. Active search is a newer concept that hopes to discover all positive targets by adaptively examining candidate targets and capturing their outcomes. Reward for the same target is only counted once, thus no point is ever chosen more than once. Theoretical analyses of active search can be made similar to bandit methods, and this proposal will consider both of these problems.

To date, most active search and bandit methods act on the premise that the action allowed is to sample a single candidate at a time and the reward is determined by the chosen candidate alone. Many real world applications, however, define action and reward in a more complex way. For example, in environmental monitoring carried out by autonomous boats with on-board sensors, a set of readings within a certain region that satisfy some pattern can indicate a real pollution problem, but any single point measurement will be insufficient evidence. This requires that reward be defined on regions. Similarly, many sensors are constructed to observe large areas simultaneously rather than single points. To operate these sensors, each action covers all the targets under the sensing region. Finally, marketing through social media may return properties of subgraphs, for example in social polling. In this work, we propose to develop multi-arm bandit and active search algorithms in these more complex action and reward settings.

My previous research has covered some basic scenarios in this area: searching for regions with sufficiently high means by collecting data at point level, discovering complex global patterns defined on regions by adaptively observing individual points, and using a new criterion called Sigma-optimality to perform active learning and polling on social network graphs. I propose to extend the theoretical aspect of these and similar problems: to apply Sigma-optimality as the exploration term for active search of positives on network graphs, to study the properties of other spectral functions in terms of explor- ation on network graphs, to analyze cumulative regrets for Gaussian process--bandit algorithms when rewards are defined by region averages, and finally to develop efficient algorithms when the action allows pulling multiple arms or regions.