92
Optimizing Unstructured Data

Optimizing Unstructured Data

Embed Size (px)

Citation preview

Optimizing Unstructured Data

@ajkohn

@SEMpdx

#SearchFest

My name is AJ Kohn

Blind Five Year Old Since 2007

Making the complex simple

Semantic Search

We have a problem

Ugh, as if!

WHAT?!

WHAT?!

Semantic search is about understanding meaning

OKAY!

OKAY!

Context

Context matters

Context matters

Natural Language

Processing

Finding all expressions that refer to the same entity in a text

Coreference Resolution

Part of Speech (POS) Tagging

Assign a part of speech to each word in a text

The word quiet isn’t spelled wrong but Google knew that I probably meant to write quite awesome instead

Machine learning

Making predictions based on patterns and rules from prior data

Google is better at getting meaning from text because

of access to more data

Entities

Letters and Words

Things

“New York” hasPopulation: 8.046 Million

hasPointsofInterest: Empire State Building

hasAddress: 350 5th Avenue hasHeight: 1,250 feet

The Knowledge Graph

Connections and relationships between entities and documents

Named Entity Recognition (NER)

One size doesn’t fit all

Context-Dependent Fine-Grained Entity Type Tagging

Not just any entities but salient entities

66 entities on a page and less than 5% are salient

http://bit.ly/bigdealentities

How do you train a machine learning model to

identify salient entities?

Hello McFly!

Word up

Word to your mother

Words

“Keywords don’t matter anymore”

Ice Bear cried, but just inside

I love structured data but optimizing unstructured data is far more powerful

Text on the page is more important now

Words = Entities ^ Context ^ Meaning

We can turn unstructured content into structured data

How much do you trust Google?

How much do you trust Google?

Stop writing for people and start writing for

search engines http://bit.ly/focusedwriting

28%

Most users don’t read but skim and scan instead

http://bit.ly/usersdontread

First you looked here

Then here

A penny for a paragraph return

Mirroring

Not only do we mirror body language we seek it out when searching

Keyword rich text and subheads allow users to

resume reading at any time

Keyword is not a four letter word

Better to you query syntax call it

But what about user delight?

Could you not

Task Completion > Aesthetics

Our job is to reduce friction

After writing your content go back and find where you can replace pronouns with nouns

Remember that readers won’t often ‘see’ these nouns but will use them as visual signposts

“It’s such a gorgeous work of art”

“Lobster and Cat is a beautiful painting”

ArtworkType: painting ArtworkTitle: Lobster and Cat

hasArtist: Pablo Picasso

Intent

Google may better understand the meaning of my query but do they know why I’m searching?

Why are they really searching?

Why are they really searching?

Common Problems with the Eureka 4870

Eureka 4870 Troubleshooting Tips

Local Vacuum Cleaner Repair Shops

Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner

Why are they really searching?

Common Problems with the Eureka 4870

Eureka 4870 Troubleshooting Tips

Local Vacuum Cleaner Repair Shops

Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner

Our job is to decode the intent from the query syntax

http://bit.ly/aggregatingintent

Target the keyword

Optimize the intent

What are we really talking about?

This is a factbox triggered by entities and

the Knowledge Graph

This answerbox is triggered by

semi-structured data

This answerbox is triggered by specific

patterns of text

Answerbox triggered by patterns of text and

specific understanding

Answerbox triggered by patterns of text and

specific understanding

Answerbox triggered by patterns of text and

semi-structured data

Answerbox triggered by patterns of text and

specific understanding

Game's the same, just got more fierce

Skate to where the puck is going to be, not to where it has been

The Link Graph

The Link Graph +

Scored Entities

<entity A>

<entity B>

<entity C> <entity B>

<entity C>

<entity A>

<entity A>

<entity D>

<entity B>

<entity D>

Entity authority could flow through links

similar to anchor text

TL;DL

We can help Google to find structure, entities and meaning in our content

The easier we make it, the more likely we are to

satisfy robots and humans

AJ Kohn Owner, Blind Five Year Old www.blindfiveyearold.com [email protected]

@ajkohn