View
43.023
Download
3
Category
Preview:
Citation preview
@TomAnthonySEO
April 2015 - BrightonSEO
HOW TO SPOT A BEAR A Machine Learning Introduction for SEOs
Can you define a list of rules for spotting
bears?
1) Four legs.
Let’s start with:
Bear!
List of rules (first half):(when I asked in the office)
1. Four legs. 2.Breathes. 3.Furry. 4. Long snout.
Bear!
List of rules:
1. Four legs. 2.Breathes. 3.Furry. 4. Long snout.
5. Brown. 6.Not always brown. 7. Mammal. 8.No tail.
(how do you spot a mammal?!)
Let’s check our rules…
Rules say:
Bear
Rules say:
Harmless Furry Thing (less than 4 legs)
Rules say:
Odd Grey Creature (no long snout)
Remove ‘long snout’, and rules say:
Bear (Extra-terrestrial bear?!)
Our rules suck.
A different bear: Google’s Panda
Can you define a list of rules for spotting spammy pages?
Same problem as bears!
NBED GOOD PAGE
Good page
NBED GOOD PAGE
Commercial page, still good.
Hrm…
Seems legit…
WTF!
Google can’t write rules.
What we can do is identify spammy or
non-spammy attributes.
Are there adverts on the page?
Are there lots of spelling mistakes?
Is there little text content?
Are there Calls To Action in ALL CAPS?
Some Possible Spam Signals
Smooth segue to:
Machine Learning
List of pages we’ve manually classified.
List of attributes that we believe are important to
classifying pages.
adverts on page?
more than 5 spelling
mistakes?
less than 200 words of content?
CTA in ALL CAPS?
site A Y Y Y Y Spam Site
site B N N Y Y Good Site
site C Y N N N Spam Site
site D N Y N Y Spam Site
site E N Y N N Good Site
Example Data
Neural Networks: A Perceptron
Inputs Output
Neuron
Neural Networks: A Perceptron
Inputs Output
1
if:inputs >= 1
output TRUE
0
1
0
0.5
0.5
0.5
0.5
1 x 0.5 = 0.50 x 0.5 = 01 x 0.5 = 0.50 x 0.5 = 0
1______
Total:Output: TRUE
1
if:inputs >= 1
output TRUE
0
1
0
0.5
0.5
0.5
0.5
TRUE
1 x 0.5 = 0.50 x 0.5 = 00 x 0.5 = 00 x 0.5 = 0
0.5______
Total:Output: FALSE
1
if:inputs >= 1
output TRUE
0
0
0
0.5
0.5
0.5
0.5
FALSE
1 x 0.5 = 0.50 x 0.5 = 01 x 0.4 = 0.40 x 0.5 = 0
0.9______
Total:Output: FALSE
1
if:inputs >= 1
output TRUE
0
1
0
0.5
0.5
0.4
0.5
FALSE
adverts on page?
more than 5 spelling
mistakes?
less than 200 words of content?
CTA in ALL CAPS?
site A Y Y Y Y Spam Site
site B N N Y Y Good Site
site C Y N N N Spam Site
site D N Y N Y Spam Site
site E N Y N N Good Site
Example Data
Untrained Neuron
Is site spam?
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.5
0.5
0.5
0.5
Training
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.5
0.5
0.5
0.5
0
0
1
1
SPAM!
Training
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.5
0.5
0.6
0.6
After training: 4/5 sites correct
Is site spam?
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.2
0.7
0.4
0.5
ANNs typically have many neuronssource: http://www.teco.edu/~albrecht/neuro/html/node18.html
Deep Learning
Humans are good at pattern matching
We’re better than machines…source: Pawan Sinha (http://web.mit.edu/bcs/sinha/papers/sinha_recog_review_NN.pdf)
ML can learn to recognise cats from examples
Deep Learning learns more like us
Ok, so what does this have to do with Google?
PandaML based algorithm updates
Old index Caffeine
Caffeine - Infrastructure Update (we believe this made Panda+Penguin possible)
Hummingbird is to ??? as
Caffeine is to Panda+Penguin
Hummingbird Is it similar to Caffeine? Is it the basis for new natural language algorithms?
Where is Google going next with ML?
Idea
Image Search 2.0
Image Labelling
Image Labelling
Video Labelling
ML Generated Image Descriptions
“Two pizzas sitting on top of a stove top oven”
Natural Language Faceted Search
Idea
‘show me olympic athletes' ‘show me the women'
“Find well rated vegetarian cooking books written after 1990”
How about:
Idea
Factual Accuracy as a
Ranking Factor
Fact CheckingKnowledge Vault
Idea: Bad Facts
NBED- shot of Google talking about this shit
Estimating ‘Trustworthiness’
Idea
Entirely ML Generated Algorithm?
http://dis.tl/ml-algo
Thanks! :)
@TomAnthonySEO
Recommended