63
@TomAnthonySEO April 2015 - BrightonSEO HOW TO SPOT A BEAR A Machine Learning Introduction for SEOs

How to Spot a Bear - An Intro to Machine Learning for SEO

Embed Size (px)

Citation preview

Page 1: How to Spot a Bear - An Intro to Machine Learning for SEO

@TomAnthonySEO

April 2015 - BrightonSEO

HOW TO SPOT A BEAR A Machine Learning Introduction for SEOs

Page 2: How to Spot a Bear - An Intro to Machine Learning for SEO
Page 3: How to Spot a Bear - An Intro to Machine Learning for SEO

Can you define a list of rules for spotting

bears?

Page 4: How to Spot a Bear - An Intro to Machine Learning for SEO

1) Four legs.

Let’s start with:

Page 5: How to Spot a Bear - An Intro to Machine Learning for SEO

Bear!

Page 6: How to Spot a Bear - An Intro to Machine Learning for SEO

List of rules (first half):(when I asked in the office)

1. Four legs. 2.Breathes. 3.Furry. 4. Long snout.

Page 7: How to Spot a Bear - An Intro to Machine Learning for SEO

Bear!

Page 8: How to Spot a Bear - An Intro to Machine Learning for SEO

List of rules:

1. Four legs. 2.Breathes. 3.Furry. 4. Long snout.

5. Brown. 6.Not always brown. 7. Mammal. 8.No tail.

(how do you spot a mammal?!)

Page 9: How to Spot a Bear - An Intro to Machine Learning for SEO

Let’s check our rules…

Page 10: How to Spot a Bear - An Intro to Machine Learning for SEO

Rules say:

Bear

Page 11: How to Spot a Bear - An Intro to Machine Learning for SEO

Rules say:

Harmless Furry Thing (less than 4 legs)

Page 12: How to Spot a Bear - An Intro to Machine Learning for SEO

Rules say:

Odd Grey Creature (no long snout)

Page 13: How to Spot a Bear - An Intro to Machine Learning for SEO

Remove ‘long snout’, and rules say:

Bear (Extra-terrestrial bear?!)

Page 14: How to Spot a Bear - An Intro to Machine Learning for SEO

Our rules suck.

Page 15: How to Spot a Bear - An Intro to Machine Learning for SEO

A different bear: Google’s Panda

Page 16: How to Spot a Bear - An Intro to Machine Learning for SEO

Can you define a list of rules for spotting spammy pages?

Same problem as bears!

Page 17: How to Spot a Bear - An Intro to Machine Learning for SEO

NBED GOOD PAGE

Good page

Page 18: How to Spot a Bear - An Intro to Machine Learning for SEO

NBED GOOD PAGE

Commercial page, still good.

Page 19: How to Spot a Bear - An Intro to Machine Learning for SEO

Hrm…

Page 20: How to Spot a Bear - An Intro to Machine Learning for SEO

Seems legit…

Page 21: How to Spot a Bear - An Intro to Machine Learning for SEO

WTF!

Page 22: How to Spot a Bear - An Intro to Machine Learning for SEO

Google can’t write rules.

Page 23: How to Spot a Bear - An Intro to Machine Learning for SEO

What we can do is identify spammy or

non-spammy attributes.

Page 24: How to Spot a Bear - An Intro to Machine Learning for SEO

Are there adverts on the page?

Are there lots of spelling mistakes?

Is there little text content?

Are there Calls To Action in ALL CAPS?

Some Possible Spam Signals

Page 25: How to Spot a Bear - An Intro to Machine Learning for SEO

Smooth segue to:

Machine Learning

Page 26: How to Spot a Bear - An Intro to Machine Learning for SEO

List of pages we’ve manually classified.

List of attributes that we believe are important to

classifying pages.

Page 27: How to Spot a Bear - An Intro to Machine Learning for SEO

adverts on page?

more than 5 spelling

mistakes?

less than 200 words of content?

CTA in ALL CAPS?

site A Y Y Y Y Spam Site

site B N N Y Y Good Site

site C Y N N N Spam Site

site D N Y N Y Spam Site

site E N Y N N Good Site

Example Data

Page 28: How to Spot a Bear - An Intro to Machine Learning for SEO

Neural Networks: A Perceptron

Inputs Output

Neuron

Page 29: How to Spot a Bear - An Intro to Machine Learning for SEO

Neural Networks: A Perceptron

Inputs Output

1

if:inputs >= 1

output TRUE

0

1

0

0.5

0.5

0.5

0.5

Page 30: How to Spot a Bear - An Intro to Machine Learning for SEO

1 x 0.5 = 0.50 x 0.5 = 01 x 0.5 = 0.50 x 0.5 = 0

1______

Total:Output: TRUE

1

if:inputs >= 1

output TRUE

0

1

0

0.5

0.5

0.5

0.5

TRUE

Page 31: How to Spot a Bear - An Intro to Machine Learning for SEO

1 x 0.5 = 0.50 x 0.5 = 00 x 0.5 = 00 x 0.5 = 0

0.5______

Total:Output: FALSE

1

if:inputs >= 1

output TRUE

0

0

0

0.5

0.5

0.5

0.5

FALSE

Page 32: How to Spot a Bear - An Intro to Machine Learning for SEO

1 x 0.5 = 0.50 x 0.5 = 01 x 0.4 = 0.40 x 0.5 = 0

0.9______

Total:Output: FALSE

1

if:inputs >= 1

output TRUE

0

1

0

0.5

0.5

0.4

0.5

FALSE

Page 33: How to Spot a Bear - An Intro to Machine Learning for SEO

adverts on page?

more than 5 spelling

mistakes?

less than 200 words of content?

CTA in ALL CAPS?

site A Y Y Y Y Spam Site

site B N N Y Y Good Site

site C Y N N N Spam Site

site D N Y N Y Spam Site

site E N Y N N Good Site

Example Data

Page 34: How to Spot a Bear - An Intro to Machine Learning for SEO

Untrained Neuron

Is site spam?

adverts

>5 spelling mistakes

< 200 words content

CTA in ALL CAPS

if:inputs >= 1

output TRUE

0.5

0.5

0.5

0.5

Page 35: How to Spot a Bear - An Intro to Machine Learning for SEO

Training

adverts

>5 spelling mistakes

< 200 words content

CTA in ALL CAPS

if:inputs >= 1

output TRUE

0.5

0.5

0.5

0.5

0

0

1

1

SPAM!

Page 36: How to Spot a Bear - An Intro to Machine Learning for SEO

Training

adverts

>5 spelling mistakes

< 200 words content

CTA in ALL CAPS

if:inputs >= 1

output TRUE

0.5

0.5

0.6

0.6

Page 37: How to Spot a Bear - An Intro to Machine Learning for SEO

After training: 4/5 sites correct

Is site spam?

adverts

>5 spelling mistakes

< 200 words content

CTA in ALL CAPS

if:inputs >= 1

output TRUE

0.2

0.7

0.4

0.5

Page 38: How to Spot a Bear - An Intro to Machine Learning for SEO

ANNs typically have many neuronssource: http://www.teco.edu/~albrecht/neuro/html/node18.html

Page 39: How to Spot a Bear - An Intro to Machine Learning for SEO

Deep Learning

Page 40: How to Spot a Bear - An Intro to Machine Learning for SEO

Humans are good at pattern matching

Page 41: How to Spot a Bear - An Intro to Machine Learning for SEO

We’re better than machines…source: Pawan Sinha (http://web.mit.edu/bcs/sinha/papers/sinha_recog_review_NN.pdf)

Page 42: How to Spot a Bear - An Intro to Machine Learning for SEO

ML can learn to recognise cats from examples

Page 43: How to Spot a Bear - An Intro to Machine Learning for SEO

Deep Learning learns more like us

Page 44: How to Spot a Bear - An Intro to Machine Learning for SEO

Ok, so what does this have to do with Google?

Page 45: How to Spot a Bear - An Intro to Machine Learning for SEO

PandaML based algorithm updates

Page 46: How to Spot a Bear - An Intro to Machine Learning for SEO

Old index Caffeine

Caffeine - Infrastructure Update (we believe this made Panda+Penguin possible)

Page 47: How to Spot a Bear - An Intro to Machine Learning for SEO

Hummingbird is to ??? as

Caffeine is to Panda+Penguin

Page 48: How to Spot a Bear - An Intro to Machine Learning for SEO

Hummingbird Is it similar to Caffeine? Is it the basis for new natural language algorithms?

Page 49: How to Spot a Bear - An Intro to Machine Learning for SEO

Where is Google going next with ML?

Page 50: How to Spot a Bear - An Intro to Machine Learning for SEO

Idea

Image Search 2.0

Page 51: How to Spot a Bear - An Intro to Machine Learning for SEO

Image Labelling

Page 52: How to Spot a Bear - An Intro to Machine Learning for SEO

Image Labelling

Page 53: How to Spot a Bear - An Intro to Machine Learning for SEO

Video Labelling

Page 54: How to Spot a Bear - An Intro to Machine Learning for SEO

ML Generated Image Descriptions

“Two pizzas sitting on top of a stove top oven”

Page 55: How to Spot a Bear - An Intro to Machine Learning for SEO

Natural Language Faceted Search

Idea

Page 56: How to Spot a Bear - An Intro to Machine Learning for SEO

‘show me olympic athletes' ‘show me the women'

Page 57: How to Spot a Bear - An Intro to Machine Learning for SEO

“Find well rated vegetarian cooking books written after 1990”

How about:

Page 58: How to Spot a Bear - An Intro to Machine Learning for SEO

Idea

Factual Accuracy as a

Ranking Factor

Page 59: How to Spot a Bear - An Intro to Machine Learning for SEO

Fact CheckingKnowledge Vault

Page 60: How to Spot a Bear - An Intro to Machine Learning for SEO

Idea: Bad Facts

NBED- shot of Google talking about this shit

Estimating ‘Trustworthiness’

Page 61: How to Spot a Bear - An Intro to Machine Learning for SEO

Idea

Entirely ML Generated Algorithm?

Page 62: How to Spot a Bear - An Intro to Machine Learning for SEO

http://dis.tl/ml-algo

Page 63: How to Spot a Bear - An Intro to Machine Learning for SEO

Thanks! :)

@TomAnthonySEO