Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Search Engines and Knowledge Graphs
It’s Complicated!
Panos AlexopoulosHead of Ontology
Who we are and what we do
We develop Technology to bridge the language and meaning gap between People and Jobs ...
I like programming, but I’m interested do take on more project management responsibility
Is there a job in our organisation that better fits my degree?
I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.
I’d like to do more with my organisational talent.
We are looking to hire:An experienced tech team team lead
The ideal candidate has:- min. 5yr of experience- Certified scrummaster- Exp. w/iOS, Android
Completed academic studies Computer Science or related
30% travel for customer presentations
… through a family of sophisticated software products ...
… that large organizations in the HR and Recruitment sector use...
Knowledge Graphs
What are Knowledge Graphs• Knowledge graphs are (large) networks of
entities, their semantic types, properties, and relationships between entities.
Textkernel Knowledge Graph• Concept Types:
• Professions• Skills • Qualifications (Degrees, Certificates)• Organizations (Companies, Educational Institutes)• Industries
• Entity relations:• Synonym• Broader/Narrower• Related (Skill2Skill, Profession2Skill, Qualification2Skill etc)
How are Knowledge Graphs used in Search
Some usages• Metadata extraction for content indexing
• Entities (e.g., skills, professions, companies etc mentioned in a CV or vacancy)• Relations (e.g., events mentioned in a news article)
• Metadata extraction for query parsing and interpretation• Entity and relation extraction from the user query
• Query Expansion• “If I am looking for a search engine specialist, I would be also fine with an Elastic Search
engineer”
• Semantic relevance calculation• “I am looking for a C++ book, how relevant would a Java book be?”
Metadata extraction from content and queries• The lexical forms of the entities and relations are used as gazetteers for extraction.• The relations between the entities are used as contextual evidence for disambiguation
Query Expansion and Semantic Relevance• The graph’s relations can be used to generate
additional entities to include in the query so as to increase recall.
• The strengths of these relations and/or the distances between the entities can be used to calculate semantic relevance.
Knowledge Graph in Search Pitfalls
3 pitfalls1. Not well-defined or well-documented semantics of
the knowledge graph.
2. Not using the right type or amount of knowledge for the search scenario at hand.
3. Not mining the knowledge from the right sources.
Bad Semantics - The abuse of synonymy• People (and therefore graphs) often consider as synonyms terms that are in reality
hyponyms or otherwise related.
Bad Semantics - The abuse of synonymySynonyms of“Economist” according to ESCO
Synonyms of “Software Engineer” according to DBPedia
● economics science researcher● macro analyst● economics analyst● economics research scientist● labour economist● social economist● interest analyst● econometrician● economics researcher● econophysicist● economics scientist● economics scholar● economics research analyst
● Senior Software Engineer● Software engineer● Consulting software engineer● Software engineering naming
controversy● Computer science engineer● Debates within software engineering● Consulting software engineers● Software Engineer● Computer Science Engineer
Bad Semantics - The abuse of synonymy• Why is this a problem:
• Synonymy means (almost) interchangeability of meaning.
• If you call a relation in that way and it isn't, then terms with different meanings will be considered as fully equivalent.
• E.g. when looking for an “economics scholar” you will always get “interest analysts” (and vice versa).
Bad Semantics - The abuse of synonymy• What to do:
• If you own the Knowledge Graph be quite strict in what you call a "synonym".
• If you are using an external Knowledge Graph be extra careful with its assumptions about synonymy.
Bad Semantics - The inadequacy of relatedness• Often Knowledge Graphs contain a "related" relation to
represent semantically related terms whose exact relation we don’t know.
• Especially with the advent of Word2Vec, semantic relatedness is (misleadingly) easy to calculate.
• The problem starts when this “related” relation has no further info about its provenance or context.
Bad Semantics - The inadequacy of relatedness
Related Skills for Data Scientist according to ESCO
Related Skills for Data Scientist according to Textkernel Knowledge Graph
data miningdata modelsinformation categorisationinformation extractiononline analytical processingquery languagesresource description framework query languagestatisticsvisual presentation techniques
Apache Spark R Big Datamachine learningPythonStakeholdersmarketing
Bad Semantics - The inadequacy of relatedness• What is the problem
• Semantic relatedness is a vague, highly subjective and context-dependent relation.
• If this relation is not adequately contextualized and documented I can’t really know whether it fits my search scenario.
• E.g. What relatedness criteria and guidelines were given to ESCO experts?• E.g. What data and relatedness measures were used by Textkernel?
Bad Semantics - The inadequacy of relatedness• What to do:
• If you own the Knowledge Graph, contextualize and document your “related” relations:
• Guidelines and criteria given to humans (experts or crowd).• Data and methods used to for mining.• Intended usage
• If you are using an external Knowledge Graph, be extra careful with its assumptions about relatedness.
Knowledge Incompatibility• Domain semantics of a Knowledge Graph are not
necessarily equivalent to the application’s semantics
• Not all relations are good for query expansion and/or semantic relevance.
• Not all entities and relations are good as disambiguation evidence.
Knowledge Incompatibility - Wrong Knowledge• Experiment made at Textkernel:
• Used Word2Vec related skills for query expansion when searching for CVs and Vacancies.
• Precision of expansion pairs was 18%!
• Developed an expansion-specific relation extractor from vacancy texts
• Precision of expansion pairs increased to 60%
Knowledge Incompatibility - Too Much Knowledge• Experiment made at iSOCO:
• Used DBPedia to extract and disambiguate mentions of players and teams from short textual descriptions of football highlights.
• Precision was 60% and recall 55%
• Pruned DBPedia to keep only entities and relations that were more likely to occur in the text and help towards disambiguation.
• Precision increased to 82% and recall to 80%
Suboptimal Knowledge Mining• Follows usually from badly defined semantics:
• No correct or clear guidelines to knowledge miners.
• Not appropriate source data selection • E.g., good search expansions are most likely to be
found in user logs.• E.g. hyponyms are most likely to be found in
definitions.
• Inaccurate training data for ML algorithms.
Suboptimal knowledge mining• What to do: Be semantics-driven, not data or method-driven!
Wrapping Up
3 Action Points
Do t s a tAvo h a m
t a
➔ Well defined schema
➔ Documentation of assumptions
➔ Careful knowledge reuse
➔ Adapt/transform the knowledge to your search scenario
➔ Beware of knowledge that may actually harm you
➔ Start with the target semantics, and use them to select your data and methods, not the other way around!
Ada f r o
Thank you!
Panos AlexopoulosHead of Ontology
E-mail: [email protected]
Web: http://www.panosalexopoulos.com
LinkedIn: www.linkedin.com/in/panosalexopoulos
Twitter: @PAlexop