Understand Relations in Knowledge Base Construction
Han Wang
December 2nd, 2014
Tetherless World Constellation Ph.D. Research Qualifier Presentation
Outline• Knowledge Base Construction
• Relation Extraction
• Understand Relations
What is a Knowledge Base (KB)?
“Comprehensive and semantically organized machine-readable collection of
universally relevant or domain-specific entities, classes, and SPO facts (attributes, relations).”
“plus spatial and temporal dimensions plus commonsense properties and rules plus contexts of entities and facts (textual & visual witnesses, descriptors, statistics) plus …..”
— Gerhard Weikum
A Sample KBHanWang type GraduateStudent HanWang type PhDStudent PhDStudent subClassOf GraduateStudent
4
Taxonomic Knowledge
Factual Knowledge
HanWang hasAdvisor PeterFox HanWang memberOf TetherlessWorldConstellation TetherlessWorldConstellation subOrganizationOf RensselaerPolytechnicInstitute RensselaerPolytechnicInstitute hasAbbreviation RPI RensselaerPolytechnicInstitute locatedAt Troy
HanWang passed ResearchQualifier HanWang researchQualifierDate “2014-12-02”
Emerging Knowledge
What can a KB do?• Disambiguation
• “Han is pursuing a Ph.D. with Prof. Fox after he got an MSIT degree at RPI.”
• “Later on, Han, accompanied by Chewbacca, Leia, and Deena Shan, traveled to the planet of Aguarl 3.
5
• Question answering • Where does Han live in 2014?
• Semantic search • Entities and relations
http://img1.wikia.nocookie.net/__cb20100129155042/starwars/images/0/01/Hansoloprofile.jpg
KB History
6
1985 20101990 2000 2005
Cyc
Wikipedia
Google Knowledge Graph
Google Knowledge Vault
Microsoft Satori
Facebook Entity Graph
155,287 words 117,659 synsets
Over 4.6 million articles Over 73,000 active editors
4.58 million things 583 million facts
46 million things 2.7 billion facts
10 million things 120 million facts
Research Tasks around KB — KB Growth• Taxonomy extraction/construction
• Define KB backbone - classes • Human work: Wordnet • Establish mapping between Wikipedia categories and Wordnet
• Entity extraction • Find instances of KB classes (unary “is-a” relation) • Use lexical patterns in free text (such as X and Y, including X and Y) • Also extract from web tables, lists, etc.
• Relation extraction • Find facts about entities
• KB alignment • Connect KBs • Establish mapping between classes • Measure (string, lexical, semantic, etc.) similarities
• Cold start • Construct KBs without external KBs
7
Research Tasks around KB — KB Validation• Entity resolution
• Merge duplicated entities • Entity linking
• Relation resolution • Merge duplicated relations • Relation clustering/paraphrasing
• Temporal and spatial validation • Add temporal and/or spatial constraints to facts
• Error detection • Remove false statements
8
Research Tasks around KB — KB Intelligence• Knowledge representation
• Semantic search • Search entities instead of strings
• Question answering • Search for answers based on KBs
• Automatic reasoning
9
Related Fields• Information Retrieval
• Natural Language Processing
• Semantic Web
• Machine Learning
Relation Extraction• Extracts semantic relations between entities from text
• Numerous foci: • Supervision: full vs. semi vs. un • Unary relations vs. binary relations vs. n-ary relations • open extraction vs. fixed set extraction • Extraction source: semi-structured data vs. free text • Feature used: only lexical vs. full linguistic features • Extract temporal information
Distant Supervised Relation Extraction• Input: an existing KB + unlabeled text
• For a pair entities participating in a relation in the KB, find them in some sentences in the text.
• Use these sentences as the training data for extracting this relation.
Han Wang RPI
alma mater
• Han Wang, a graduate student in RPI, … • Han received an MSIT degree from RPI. • Han is currently working as a research assistant in RPI. • Han watched a hockey game between RPI and Yale.
???
Learning Features (Sentence Level)• Lexical: words between and around entity mentions and
their POS tags
• Syntactic: dependency parse paths between entity mentions and other words
• Named entity tags
• Conjunction of the above features
Distant Supervision Hypotheses
Han Wang RPI
Han Wang, a graduate student in RPI, …
Han received an MSIT degree from RPI.
Han is currently working as a research assistant in RPI.
Han watched a hockey game between RPI and Yale.
• All sentences containing the entity pair express the relation. [Mintz et al., 2009]
alma mater
“object”“instance” “label”
Single-Instance Single-Label
Distant Supervision Hypotheses• All sentences containing the entity pair express the relation.
[Mintz et al., 2009]
• At least one sentence containing the entity pair expresses the relation. [Riedel et al., 2010]
Han Wang RPI
“object”
alma mater
“instance” “label”
Han Wang, a graduate student in RPI, …
Han watched a hockey game between RPI and Yale.
Han received an MSIT degree from RPI.
Han is currently working as a research assistant in RPI.
Multiple-Instance Single-Label
Distant Supervision Hypotheses• All sentences containing the entity pair express the relation.
[Mintz et al., 2009]
• At least one sentence containing the entity pair express the relation. [Riedel et al., 2010]
• At least one sentence containing the entity pair expresses the relation, and these these two entities can have multiple relations. [Hoffmann et al., 2011; Surdeanu et al., 2012]
Han Wang RPI
alma materHan Wang, a graduate student in RPI, …
Han watched a hockey game between RPI and Yale.
Han received an MSIT degree from RPI.
Han is currently working as a research assistant in RPI.employee of
NIL
“object”“instance” “label”
Multiple-Instance Multiple-Label
Relaxing hypotheses Works
[Weston et al., 2013]
Never-Ending Language Learning (NELL) [Carlson et al., 2010]
http://rtw.ml.cmu.edu/rtw/
• Extracts both entities and relations
• Bootstrapping semi-supervised learning • Starts with a small amount of labeled data as seeds • Grows the seeds with high-confidence instances found by the current
model • Suffers from “semantic drift”
• Overcomes the semantic drift by constraining the semi-supervised learning • Mutual exclusion. E.g. a “PhDStudent” cannot also be a “Professor” • Relation argument type checking. E.g. arguments of “employeeOf” should
be “Person” and “Organization” • Unstructured and semi-structured text features. E.g. extractions from free
text and web tables should agree with each other
Never-Ending Language Learning (NELL) [Carlson et al., 2010]
Knowledge Vault [Dong et al., 2014]• Automatically extracts triples from the Web
• Various extraction sources: free text, HTML DOM trees, HTML tables, and human annotations (schema.org, RDFa)
• Uses distant supervision to get triples as the training data. • Uses local closed world assumption (a.k.a. partial completeness
assumption) to assign labels to training data • Extraction results are fused together from different sources with a
confidence score
• Uses prior knowledge (Freebase) to evaluate the extracted triples • Uses Freebase triples to predict the probabilities of the extracted triples
• Combines extraction and prior together
Knowledge Vault [Dong et al., 2014]
Open Information Extraction• Extracts relational tuples without any pre-defined vocabulary
or training data
• Extracts verb phrases as the relations [Fader et al., 2011] • Longest sequences containing the verb and starting with the verbs
• Syntactic constraint: matching POS patterns (e.g. VB, VB DET | PREP, VB NP PREP)
• Lexical constraint: number of distinct argument pairs > threshold
• Extracts relations mediated by nouns, adjectives, and more [Mausam et al., 2012] • Open pattern templates: syntactic patterns + semantic/lexical
constraints
Open Information Extraction
Over 5 billion extractions from over a billion web pages
Revisit: Relations
• A relation can be expressed in many ways (patterns)
• How to establish the “definition” of a relation?
• What makes two relations different?
• How to measure the similarity of two relations?
Han Wang RPI
alma mater
• Han Wang, a graduate student in RPI, … • Han received an MSIT degree from RPI.
PATTY [Nakashole et al., 2012]• A Taxonomy of Relational Patterns with Semantic Types
• Syntactic-Lexical-Ontological (SOL) Pattern Model • Syntactic-Lexical: surface words, wildcards, POS tags • Ontological: semantic type classes as entity placeholders • Support set of pattern: set of entity-pairs for placeholders
• An example: • Pattern: <Person> ’s ADJECTIVE voice * in <Song> • Matching sentences:
• Amy Winehouse’s soul voice in her song ‘Rehab’ • Jim Morrison’s haunting voice and charisma in ‘The End’ • Joan Baez’s angel-like voice in ‘Farewell Angelina’
• Support set: • (Amy Winehouse, Rehab) • (Jim Morrison, The End) • (Joan Baez, Farewell Angelina)
PATTY [Nakashole et al., 2012]• Synonymous patterns (pattern synsets)
• The support sets of two patterns are equivalent • <Person> graduated from <School> =
<Person> obtained degree in * from <School> • and PRONOUN ADJECTIVE advisor <Person> =
under the supervision of <Person>
• Subsumption patterns • The support set of a pattern is the subset of the support set of
another pattern • <Person> nominated for <Award> ⊐
<Person> winner of <Award>
PATTY [Nakashole et al., 2012]
350,000 SOL patterns with 4 million instances
VerbNet [Kipper 2008]• A verb lexicon that groups approximately 5,800 verbs in
over 270 classes
• Inspired by Beth Levin’s classification of verb classes and their syntactic alternations [Levin, 1993] • Members within a single verb class participate in shared types
of alternations because of an underlying shared semantic meaning
• Verbs are classified according to shared syntactic behaviors
VerbNet [Kipper 2008]
VerbNet [Kipper 2008]
What else can be done?• PATTY measures the similarity between relation (patterns) using their
argument (entity) instance sets — an extensional way
• Verbnet defines verbs intensionally via shared syntactic behaviors (semantic meanings).
• Pattern attributes • Surface words • POS tags • Dependency paths • Semantic roles • More deep semantic features (e.g. AMR) • Relation (pattern) embedding
• How about combining them together?
Application: Provenance Modeling
References[1] Niranjan Balasubramanian, Stephen Soderland, and Oren Etzioni. “Rel- grams: a probabilistic model of relations in text”. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web- scale Knowledge Extraction. 2012. url: http://dl.acm.org/citation. cfm?id=2391219.[2] Michele Banko, MJ Cafarella, and Stephen Soderland. “Open Information Extraction from the Web”. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, pp. 2670–2676. url: http: //www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf.[3] Danushka T. Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. “Measuring the similarity between implicit semantic relations from the web”. In: Pro- ceedings of the 18th international conference on World wide web (2009), p. 651. doi: 10.1145/1526709.1526797. url: http://portal.acm.org/ citation.cfm?doid=1526709.1526797.[4] DT Bollegala, Y Matsuo, and M Ishizuka. “Relational duality: Unsuper- vised extraction of semantic relations between entities on the web”. In: Proceedings of the 19th international conference on World wide web. 2010, pp. 151–160. isbn: 9781605587998. url: http://dl.acm.org/citation. cfm?id=1772707.[5] Antoine Bordes and Evgeniy Gabrilovich. “Constructing and mining web- scale knowledge graphs”. In: Proceedings of the 20th ACM SIGKDD inter- national conference on Knowledge discovery and data mining. New York, New York, USA: ACM Press, 2014, pp. 1967–1967. isbn: 9781450329569. doi: 10.1145/2623330.2630803. url: http://dl.acm.org/citation. cfm?doid=2623330.2630803.[6] Andrew Carlson and Justin Betteridge. “Coupled semi-supervised learn- ing for information extraction”. In: Proceedings of the third ACM interna- tional conference on Web search and data mining. 2010. isbn: 9781605588896. url: http://dl.acm.org/citation.cfm?id=1718501.[7] Andrew Carlson, Justin Betteridge, and Bryan Kisiel. “Toward an Ar- chitecture for Never-Ending Language Learning.” In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence. 2010. url: http: //www.aaai.org/ocs/index.php/aaai/aaai10/paper/download/ 1879/2201.[8] Omkar Deshpande, DS Lamba, and Michel Tourn. “Building, maintaining, and using knowledge bases: a report from the trenches”. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013. isbn: 9781450320375. url: http://dl.acm.org/citation. cfm?id=2465297.[9] XL Dong et al. “Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion”. In: Proceedings of the 20th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining. 2014. url: http://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf.[10] Oren Etzioni, Anthony Fader, and Janara Christensen. “Open Informa- tion Extraction: The Second Generation.” In: Proceedings of the Twenty- Second international joint conference on Artificial Intelligence. 2011. url: http://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/ viewFile/3353/3408.[11] Anthony Fader, Stephen Soderland, and Oren Etzioni. “Identifying rela- tions for open information extraction”. In: Proceedings of the 2011 Con- ference on Empirical Methods in Natural Language Processing. 2011. url: http://dl.acm.org/citation.cfm?id=2145596.[12] J. Fan et al. “Automatic knowledge extraction from documents”. In: IBM Journal of Research and Development 56.3 (2012), pp. 1–10. url: http: //ieeexplore.ieee.org/xpls/abs%5C_all.jsp?arnumber=6177723.[13] Adam Grycner and Gerhard Weikum. “HARPY: Hypernyms and Align- ment of Relational Paraphrases”. In: Proceedings of the 25th Interna- tional Conference on Computational Linguistics. 2014, pp. 2195–2204. url: http://pubman.mpdl.mpg.de/pubman/faces/viewItemOverviewPage. jsp?itemId=escidoc:2069526:4.[14] Raphael Hoffmann et al. “Knowledge-Based Weak Supervision for Infor- mation Extraction of Overlapping Relations”. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, pp. 541–550.[15] Charles Kemp, JB Tenenbaum, and TL Griffiths. “Learning systems of concepts with an infinite relational model”. In: Proceedings of the 21st National Conference on Artificial Intelligence. 2006, pp. 381–388. url: http://www.aaai.org/Papers/AAAI/2006/AAAI06-061.pdf.[16] Karin Kipper et al. “A large-scale classification of English verbs”. In: Language Resources and Evaluation 42.1 (Dec. 2007), pp. 21–40. issn: 1574-020X. doi: 10.1007/s10579-007-9048-2. url: http://link. springer.com/10.1007/s10579-007-9048-2.[17] Stanley Kok and Pedro Domingos. “Extracting semantic networks from text via relational clustering”. In: Machine Learning and Knowledge Dis- covery in Databases (2008), pp. 1–16. url: http://link.springer.com/ chapter/10.1007/978-3-540-87479-9%5C_59.[18] Thomas Lin and Oren Etzioni. “Entity linking at web scale”. In: Proceed- ings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. 2012. url: http://dl.acm.org/ citation.cfm?id=2391216.[19] Mike Mintz et al. “Distant supervision for relation extraction without labeled data”. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. August. 2009, pp. 1003–1011. url: http://dl.acm.org/citation.cfm?id=1690287.[20] TP Mohamed, ER Hruschka Jr, and TM Mitchell. “Discovering relations between noun categories”. In: Proceedings of the 2011 Conference on Em- pirical Methods in Natural Language Processing. 2011, pp. 1447–1455. url: http://dl.acm.org/citation.cfm?id=2145586.[21] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. “Discov- ering and exploring relations on the web”. In: Proceedings of the VLDB Endowment 5.12 (Aug. 2012), pp. 1982–1985. issn: 21508097. doi: 10. 14778/2367502.2367553. url: http://dl.acm.org/citation.cfm? doid=2367502.2367553.[22] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. “PATTY: a taxonomy of relational patterns with semantic types”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. July. 2012, pp. 1135–1145. url: http://dl.acm.org/citation.cfm?id=2391076.[23] F Niu et al. “DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference.” In: Proceedings of Very Large Data Search. 2012. url: http://www-cs.stanford.edu/~chrismre/papers/ deepdive%5C_vlds.pdf.[24] Sebastian Riedel, Limin Yao, and A McCallum. “Modeling relations and their mentions without labeled text”. In: Machine Learning and Knowl- edge Discovery in Databases. 2010, pp. 1–16. url: http://link.springer. com/chapter/10.1007/978-3-642-15939-8%5C_10.[25] Sebastian Riedel et al. “Relation extraction with matrix factorization and universal schemas”. In: Proceedings of Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the As- sociation for Computational Linguistics (HLT-NAACL ’13). June. 2013, pp. 74–84. url: http://works.bepress.com/benjamin%5C_marlin/1/.[26] Michael Schmitz et al. “Open language learning for information extrac- tion”. In: Proceedings of the 2012 Joint Conference on Empirical Meth- ods in Natural Language Processing and Computational Natural Language Learning. 2012. url: http://dl.acm.org/citation.cfm?id=2391009.[27] Fabian M Suchanek et al. “Advances in Automated Knowledge Base Con- struction”. In: The Joint Workshop on Automatic Knowledge Base Con- struction and Web-scale Knowledge Extraction. 2012.[28] Fabian Suchanek and Gerhard Weikum. “Knowledge Bases in the Age of Big Data Analytics Turn Web into Knowledge Base”. In: Proceedings of the 40th International Conference on Very Large Data Bases. 2014.[29] Mihai Surdeanu et al. “Multi-instance Multi-label Learning for Relation Extraction”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Lan- guage Learning. July. 2012, pp. 455–465.[30] PD Turney. “Similarity of semantic relations”. In: Computational Linguis- tics February (2006). url: http://www.mitpressjournals.org/doi/ abs/10.1162/coli.2006.32.3.379.[31] Limin Yao, S Riedel, and A McCallum. “Collective cross-document rela- tion extraction without labelled data”. In: Proceedings of the 2010 Con- ference on Empirical Methods in Natural Language Processing. October. 2010, pp. 1013–1023. url: http://dl.acm.org/citation.cfm?id= 1870757.
Thank you!
Questions? Comments? Suggestions?
(Back-up) Dependency Grammar
The dependency relation views the (finite) verb as the structural center of all clause structure. All other syntactic units (e.g. words) are either directly or indirectly dependent on the verb.
Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas.
(Back-up) Semantic Role Labeling
Detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles.
(Back-up) Abstract Meaning Representation
AMR captures “who is doing what to whom” in a sentence. Each sentence is represented as a rooted, directed, acyclic graph with labels on edges (relations) and leaves (concepts).