Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned

Search Engines and Knowledge Graphs

It’s Complicated!

Panos AlexopoulosHead of Ontology

Who we are and what we do

We develop Technology to bridge the language and meaning gap between People and Jobs ...

I like programming, but I’m interested do take on more project management responsibility

Is there a job in our organisation that better fits my degree?

I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.

I’d like to do more with my organisational talent.

We are looking to hire:An experienced tech team team lead

The ideal candidate has:- min. 5yr of experience- Certified scrummaster- Exp. w/iOS, Android

Completed academic studies Computer Science or related

30% travel for customer presentations

… through a family of sophisticated software products ...

… that large organizations in the HR and Recruitment sector use...

Knowledge Graphs

What are Knowledge Graphs• Knowledge graphs are (large) networks of

entities, their semantic types, properties, and relationships between entities.

Textkernel Knowledge Graph• Concept Types:

• Professions• Skills • Qualifications (Degrees, Certificates)• Organizations (Companies, Educational Institutes)• Industries

• Entity relations:• Synonym• Broader/Narrower• Related (Skill2Skill, Profession2Skill, Qualification2Skill etc)

How are Knowledge Graphs used in Search

Some usages• Metadata extraction for content indexing

• Entities (e.g., skills, professions, companies etc mentioned in a CV or vacancy)• Relations (e.g., events mentioned in a news article)

• Metadata extraction for query parsing and interpretation• Entity and relation extraction from the user query

• Query Expansion• “If I am looking for a search engine specialist, I would be also fine with an Elastic Search

engineer”

• Semantic relevance calculation• “I am looking for a C++ book, how relevant would a Java book be?”

Metadata extraction from content and queries• The lexical forms of the entities and relations are used as gazetteers for extraction.• The relations between the entities are used as contextual evidence for disambiguation

Query Expansion and Semantic Relevance• The graph’s relations can be used to generate

additional entities to include in the query so as to increase recall.

• The strengths of these relations and/or the distances between the entities can be used to calculate semantic relevance.

Knowledge Graph in Search Pitfalls

3 pitfalls1. Not well-defined or well-documented semantics of

the knowledge graph.

2. Not using the right type or amount of knowledge for the search scenario at hand.

3. Not mining the knowledge from the right sources.

Bad Semantics - The abuse of synonymy• People (and therefore graphs) often consider as synonyms terms that are in reality

hyponyms or otherwise related.

Bad Semantics - The abuse of synonymySynonyms of“Economist” according to ESCO

Synonyms of “Software Engineer” according to DBPedia

● economics science researcher● macro analyst● economics analyst● economics research scientist● labour economist● social economist● interest analyst● econometrician● economics researcher● econophysicist● economics scientist● economics scholar● economics research analyst

● Senior Software Engineer● Software engineer● Consulting software engineer● Software engineering naming

controversy● Computer science engineer● Debates within software engineering● Consulting software engineers● Software Engineer● Computer Science Engineer

Bad Semantics - The abuse of synonymy• Why is this a problem:

• Synonymy means (almost) interchangeability of meaning.

• If you call a relation in that way and it isn't, then terms with different meanings will be considered as fully equivalent.

• E.g. when looking for an “economics scholar” you will always get “interest analysts” (and vice versa).

Bad Semantics - The abuse of synonymy• What to do:

• If you own the Knowledge Graph be quite strict in what you call a "synonym".

• If you are using an external Knowledge Graph be extra careful with its assumptions about synonymy.

Bad Semantics - The inadequacy of relatedness• Often Knowledge Graphs contain a "related" relation to

represent semantically related terms whose exact relation we don’t know.

• Especially with the advent of Word2Vec, semantic relatedness is (misleadingly) easy to calculate.

• The problem starts when this “related” relation has no further info about its provenance or context.

Bad Semantics - The inadequacy of relatedness

Related Skills for Data Scientist according to ESCO

Related Skills for Data Scientist according to Textkernel Knowledge Graph

data miningdata modelsinformation categorisationinformation extractiononline analytical processingquery languagesresource description framework query languagestatisticsvisual presentation techniques

Apache Spark R Big Datamachine learningPythonStakeholdersmarketing

Bad Semantics - The inadequacy of relatedness• What is the problem

• Semantic relatedness is a vague, highly subjective and context-dependent relation.

• If this relation is not adequately contextualized and documented I can’t really know whether it fits my search scenario.

• E.g. What relatedness criteria and guidelines were given to ESCO experts?• E.g. What data and relatedness measures were used by Textkernel?

Bad Semantics - The inadequacy of relatedness• What to do:

• If you own the Knowledge Graph, contextualize and document your “related” relations:

• Guidelines and criteria given to humans (experts or crowd).• Data and methods used to for mining.• Intended usage

• If you are using an external Knowledge Graph, be extra careful with its assumptions about relatedness.

Knowledge Incompatibility• Domain semantics of a Knowledge Graph are not

necessarily equivalent to the application’s semantics

• Not all relations are good for query expansion and/or semantic relevance.

• Not all entities and relations are good as disambiguation evidence.

Knowledge Incompatibility - Wrong Knowledge• Experiment made at Textkernel:

• Used Word2Vec related skills for query expansion when searching for CVs and Vacancies.

• Precision of expansion pairs was 18%!

• Developed an expansion-specific relation extractor from vacancy texts

• Precision of expansion pairs increased to 60%

Knowledge Incompatibility - Too Much Knowledge• Experiment made at iSOCO:

• Used DBPedia to extract and disambiguate mentions of players and teams from short textual descriptions of football highlights.

• Precision was 60% and recall 55%

• Pruned DBPedia to keep only entities and relations that were more likely to occur in the text and help towards disambiguation.

• Precision increased to 82% and recall to 80%

Suboptimal Knowledge Mining• Follows usually from badly defined semantics:

• No correct or clear guidelines to knowledge miners.

• Not appropriate source data selection • E.g., good search expansions are most likely to be

found in user logs.• E.g. hyponyms are most likely to be found in

definitions.

• Inaccurate training data for ML algorithms.

Suboptimal knowledge mining• What to do: Be semantics-driven, not data or method-driven!

Wrapping Up

3 Action Points

Do t s a tAvo h a m

t a

➔ Well defined schema

➔ Documentation of assumptions

➔ Careful knowledge reuse

➔ Adapt/transform the knowledge to your search scenario

➔ Beware of knowledge that may actually harm you

➔ Start with the target semantics, and use them to select your data and methods, not the other way around!

Ada f r o

Thank you!

Panos AlexopoulosHead of Ontology

E-mail: [email protected]

Web: http://www.panosalexopoulos.com

LinkedIn: www.linkedin.com/in/panosalexopoulos

Twitter: @PAlexop

Documents

Search Engines and Knowledge Graphs - Panos Alexopoulos€¦ · Some usages • Metadata extraction for content indexing • Entities (e.g., skills, professions, companies etc mentioned