30
Search Engines and Knowledge Graphs It’s Complicated! Panos Alexopoulos Head of Ontology

Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Search Engines and Knowledge Graphs

It’s Complicated!

Panos AlexopoulosHead of Ontology

Page 2: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Who we are and what we do

Page 3: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

We develop Technology to bridge the language and meaning gap between People and Jobs ...

I like programming, but I’m interested do take on more project management responsibility

Is there a job in our organisation that better fits my degree?

I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.

I’d like to do more with my organisational talent.

We are looking to hire:An experienced tech team team lead

The ideal candidate has:- min. 5yr of experience- Certified scrummaster- Exp. w/iOS, Android

Completed academic studies Computer Science or related

30% travel for customer presentations

Page 4: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

… through a family of sophisticated software products ...

Page 5: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

… that large organizations in the HR and Recruitment sector use...

Page 6: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Knowledge Graphs

Page 7: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

What are Knowledge Graphs• Knowledge graphs are (large) networks of

entities, their semantic types, properties, and relationships between entities.

Page 8: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Textkernel Knowledge Graph• Concept Types:

• Professions• Skills • Qualifications (Degrees, Certificates)• Organizations (Companies, Educational Institutes)• Industries

• Entity relations:• Synonym• Broader/Narrower• Related (Skill2Skill, Profession2Skill, Qualification2Skill etc)

Page 9: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

How are Knowledge Graphs used in Search

Page 10: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Some usages• Metadata extraction for content indexing

• Entities (e.g., skills, professions, companies etc mentioned in a CV or vacancy)• Relations (e.g., events mentioned in a news article)

• Metadata extraction for query parsing and interpretation• Entity and relation extraction from the user query

• Query Expansion• “If I am looking for a search engine specialist, I would be also fine with an Elastic Search

engineer”

• Semantic relevance calculation• “I am looking for a C++ book, how relevant would a Java book be?”

Page 11: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Metadata extraction from content and queries• The lexical forms of the entities and relations are used as gazetteers for extraction.• The relations between the entities are used as contextual evidence for disambiguation

Page 12: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Query Expansion and Semantic Relevance• The graph’s relations can be used to generate

additional entities to include in the query so as to increase recall.

• The strengths of these relations and/or the distances between the entities can be used to calculate semantic relevance.

Page 13: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Knowledge Graph in Search Pitfalls

Page 14: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

3 pitfalls1. Not well-defined or well-documented semantics of

the knowledge graph.

2. Not using the right type or amount of knowledge for the search scenario at hand.

3. Not mining the knowledge from the right sources.

Page 15: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The abuse of synonymy• People (and therefore graphs) often consider as synonyms terms that are in reality

hyponyms or otherwise related.

Page 16: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The abuse of synonymySynonyms of“Economist” according to ESCO

Synonyms of “Software Engineer” according to DBPedia

● economics science researcher● macro analyst● economics analyst● economics research scientist● labour economist● social economist● interest analyst● econometrician● economics researcher● econophysicist● economics scientist● economics scholar● economics research analyst

● Senior Software Engineer● Software engineer● Consulting software engineer● Software engineering naming

controversy● Computer science engineer● Debates within software engineering● Consulting software engineers● Software Engineer● Computer Science Engineer

Page 17: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The abuse of synonymy• Why is this a problem:

• Synonymy means (almost) interchangeability of meaning.

• If you call a relation in that way and it isn't, then terms with different meanings will be considered as fully equivalent.

• E.g. when looking for an “economics scholar” you will always get “interest analysts” (and vice versa).

Page 18: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The abuse of synonymy• What to do:

• If you own the Knowledge Graph be quite strict in what you call a "synonym".

• If you are using an external Knowledge Graph be extra careful with its assumptions about synonymy.

Page 19: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The inadequacy of relatedness• Often Knowledge Graphs contain a "related" relation to

represent semantically related terms whose exact relation we don’t know.

• Especially with the advent of Word2Vec, semantic relatedness is (misleadingly) easy to calculate.

• The problem starts when this “related” relation has no further info about its provenance or context.

Page 20: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The inadequacy of relatedness

Related Skills for Data Scientist according to ESCO

Related Skills for Data Scientist according to Textkernel Knowledge Graph

data miningdata modelsinformation categorisationinformation extractiononline analytical processingquery languagesresource description framework query languagestatisticsvisual presentation techniques

Apache Spark R Big Datamachine learningPythonStakeholdersmarketing

Page 21: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The inadequacy of relatedness• What is the problem

• Semantic relatedness is a vague, highly subjective and context-dependent relation.

• If this relation is not adequately contextualized and documented I can’t really know whether it fits my search scenario.

• E.g. What relatedness criteria and guidelines were given to ESCO experts?• E.g. What data and relatedness measures were used by Textkernel?

Page 22: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Bad Semantics - The inadequacy of relatedness• What to do:

• If you own the Knowledge Graph, contextualize and document your “related” relations:

• Guidelines and criteria given to humans (experts or crowd).• Data and methods used to for mining.• Intended usage

• If you are using an external Knowledge Graph, be extra careful with its assumptions about relatedness.

Page 23: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Knowledge Incompatibility• Domain semantics of a Knowledge Graph are not

necessarily equivalent to the application’s semantics

• Not all relations are good for query expansion and/or semantic relevance.

• Not all entities and relations are good as disambiguation evidence.

Page 24: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Knowledge Incompatibility - Wrong Knowledge• Experiment made at Textkernel:

• Used Word2Vec related skills for query expansion when searching for CVs and Vacancies.

• Precision of expansion pairs was 18%!

• Developed an expansion-specific relation extractor from vacancy texts

• Precision of expansion pairs increased to 60%

Page 25: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Knowledge Incompatibility - Too Much Knowledge• Experiment made at iSOCO:

• Used DBPedia to extract and disambiguate mentions of players and teams from short textual descriptions of football highlights.

• Precision was 60% and recall 55%

• Pruned DBPedia to keep only entities and relations that were more likely to occur in the text and help towards disambiguation.

• Precision increased to 82% and recall to 80%

Page 26: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Suboptimal Knowledge Mining• Follows usually from badly defined semantics:

• No correct or clear guidelines to knowledge miners.

• Not appropriate source data selection • E.g., good search expansions are most likely to be

found in user logs.• E.g. hyponyms are most likely to be found in

definitions.

• Inaccurate training data for ML algorithms.

Page 27: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Suboptimal knowledge mining• What to do: Be semantics-driven, not data or method-driven!

Page 28: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Wrapping Up

Page 29: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

3 Action Points

Do t s a tAvo h a m

t a

➔ Well defined schema

➔ Documentation of assumptions

➔ Careful knowledge reuse

➔ Adapt/transform the knowledge to your search scenario

➔ Beware of knowledge that may actually harm you

➔ Start with the target semantics, and use them to select your data and methods, not the other way around!

Ada f r o

Page 30: Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Thank you!

Panos AlexopoulosHead of Ontology

E-mail: [email protected]

Web: http://www.panosalexopoulos.com

LinkedIn: www.linkedin.com/in/panosalexopoulos

Twitter: @PAlexop