Research Prediction Games in Infinitely Rich Worlds Omid Madani Yahoo! Research

ResearchResearch

Prediction Games in Infinitely Rich Worlds

Omid Madani

Yahoo! Research

ResearchResearch

“Rather, the formation and use of categories is the stuff of experience.”

Philosophy in the Flesh, Lakoff and Johnson.

ResearchResearch

Motivation

• Higher intelligence requires myriad inter-related categories

• How can such be acquired?• Programming them unlikely to be

successful:• Limits of our explicit knowledge• Unknown/unfamiliar domains• Making the system operational..

ResearchResearch

Learn? … How?

• “Supervised” learning likely inadequate:• Required:

• ~millions of categories and beyond..• Billions of weights, and beyond..

• Inaccessible “knowledge” (see last slide!)

• Other approaches are fall short (incomplete, etc): clustering, RL, active learning, etc..

ResearchResearch

This Work: An Exploration

• An avenue: “prediction games in infinitely rich worlds”

• Exciting part: • World provides unbounded learning

opportunity! (world is the teacher!)• World enjoys many regularities (e.g.

“hierarchical”)

ResearchResearch

This Work• Describe the setting

• The games, categories, …

• Discuss:• Desiderata/constraints• Some of the many

challenges/problems

• Preliminary system/observations..

ResearchResearch

The Game

• Repeat • Hide part(s) of the stream• Predict (use context)• Update• Move on

• Goal: predict better ... (subject to constraints)• In the process: categories at different levels of

abstraction learned• Some details: what parts to hide? How much

context? What order?

ResearchResearch

In a Nutshell

Prediction System

…. 0011101110000….

After a While

predict observe & update

Prediction System

observe & updatepredict

low level categories

higher level categories(bigger chunks)(bits, characters, edges,…)

(e.g. words, digits, phrases, phone numbers, faces, visual objects, home pages, sites,…)

ResearchResearch

Example of Games (text)

• .. d?an.. • System predictions (ranked or

assigned probabilities, or.. )• “r”• “e”• “o”• …

• I ? my bike to school.

ResearchResearch

Categories

• Building blocks of intelligence?• Patterns that frequently occur

• External • Internal..• Useful for predicting other categories!• They can have structure/regularities

1. Composition (~conjunctions) of others

2. Grouping (~disjunctions)

ResearchResearch

Categories

• Low level examples: 0 and 1 or characters• Provided to the system

• Higher levels:• Sequence of k bits• Words• Phrases• Regular expressions • Phone number, contact info, resume, ...

ResearchResearch

Prediction Objective

• Desirable: learn higher level categories (bigger/abstract categories are useful externally)

• Question: how does this relate to improving predictions?

1. Higher level categories improve “context” and can save memory

2. Bigger, save time in playing the game (categories are atomic)

ResearchResearch

Goal (evaluation criterion)

• Number of bits (characters) correctly predicted per

unit time (or per prediction action)

• Subject to constraints (space, time,..)

• How about entropy/perplexity? Categories are structured..

ResearchResearch

Desiderata/Challenges/Issues• Lots of data!

• Efficiency: space and time!

• Noise:• Statistical insignificance• Significance, but for short time..

• Variety (need for abstraction)• Drift (e.g. developments within system)• Motivate: (primarily) online

algorithms/systems

ResearchResearch

Desiderata/Challenges• Why need for “system”s?

• Multiple algorithms/parts needed• Persistence

• Long term learning: how can we make sure noise/errors do not accumulate?

• Control of the input stream..

ResearchResearch

Why Now?

• Many category learning is possible/efficient!• Online• Noise tolerant

• Expectation: other problems are solvable..

ResearchResearch

Preliminary Report

• Work in Progress!

• Plays the game in text

• Begins at character level

• No segmentation, just a stream

• Makes and predicts larger sequences (composition)

ResearchResearch

Preliminary Observations

• Ran on Reuters RCV1 (text body) ( simply zcat dir/file* )

• 800k articles• >= 150 million learning/prediction episodes• Over 10 million categories built• 3-4 hours each pass

ResearchResearch

Observations• Performance on held out (one of the

Reuters files):• 8-9 characters long to predict on average• Almost two characters correct on

average, per prediction action

• Can overfit/memorize! (long categories)

• Current: stop category generation in first pass

ResearchResearch

ResearchResearch

Current/Future

• Much work:• Learn groupings• Recognize/use “syntactic”

categories?• Prediction objective is ok?• Category generation.. What’s a good

method?

• Compare: language modeling, etc

ResearchResearch

Much Related Work!

• Online learning, clustering, deep learning, Bayesian methods, hierarchical learning, importance of predictions (“On Intelligence”, “natural computations”), models of neocortex (“circuits of the mind”), concepts (“big book of concepts”), cumulative learning, neural nets, compression, learning an index of categories!

Documents

Research Prediction Games in Infinitely Rich Worlds Omid Madani Yahoo! Research