Download pdf - Understand Relations in Knowledge Base Constructiontw.rpi.edu/media/2014/12/02/d85c/HanWang-ResearchQualifier.pdf · Understand Relations in Knowledge Base Construction Han Wang December

Understand Relations in Knowledge Base Construction

Han Wang

December 2nd, 2014

Tetherless World Constellation Ph.D. Research Qualifier Presentation

Outline• Knowledge Base Construction

• Relation Extraction

• Understand Relations

What is a Knowledge Base (KB)?

“Comprehensive and semantically organized machine-readable collection of

universally relevant or domain-specific entities, classes, and SPO facts (attributes, relations).”

“plus spatial and temporal dimensions plus commonsense properties and rules plus contexts of entities and facts (textual & visual witnesses, descriptors, statistics) plus …..”

— Gerhard Weikum

A Sample KBHanWang type GraduateStudent HanWang type PhDStudent PhDStudent subClassOf GraduateStudent

4

Taxonomic Knowledge

Factual Knowledge

HanWang hasAdvisor PeterFox HanWang memberOf TetherlessWorldConstellation TetherlessWorldConstellation subOrganizationOf RensselaerPolytechnicInstitute RensselaerPolytechnicInstitute hasAbbreviation RPI RensselaerPolytechnicInstitute locatedAt Troy

HanWang passed ResearchQualifier HanWang researchQualifierDate “2014-12-02”

Emerging Knowledge

What can a KB do?• Disambiguation

• “Han is pursuing a Ph.D. with Prof. Fox after he got an MSIT degree at RPI.”

• “Later on, Han, accompanied by Chewbacca, Leia, and Deena Shan, traveled to the planet of Aguarl 3.

5

• Question answering • Where does Han live in 2014?

• Semantic search • Entities and relations

http://img1.wikia.nocookie.net/__cb20100129155042/starwars/images/0/01/Hansoloprofile.jpg

KB History

6

1985 20101990 2000 2005

Cyc

Wikipedia

Google Knowledge Graph

Google Knowledge Vault

Microsoft Satori

Facebook Entity Graph

155,287 words 117,659 synsets

Over 4.6 million articles Over 73,000 active editors

4.58 million things 583 million facts

46 million things 2.7 billion facts

10 million things 120 million facts

Research Tasks around KB — KB Growth• Taxonomy extraction/construction

• Define KB backbone - classes • Human work: Wordnet • Establish mapping between Wikipedia categories and Wordnet

• Entity extraction • Find instances of KB classes (unary “is-a” relation) • Use lexical patterns in free text (such as X and Y, including X and Y) • Also extract from web tables, lists, etc.

• Relation extraction • Find facts about entities

• KB alignment • Connect KBs • Establish mapping between classes • Measure (string, lexical, semantic, etc.) similarities

• Cold start • Construct KBs without external KBs

7

Research Tasks around KB — KB Validation• Entity resolution

• Merge duplicated entities • Entity linking

• Relation resolution • Merge duplicated relations • Relation clustering/paraphrasing

• Temporal and spatial validation • Add temporal and/or spatial constraints to facts

• Error detection • Remove false statements

8

Research Tasks around KB — KB Intelligence• Knowledge representation

• Semantic search • Search entities instead of strings

• Question answering • Search for answers based on KBs

• Automatic reasoning

9

Related Fields• Information Retrieval

• Natural Language Processing

• Semantic Web

• Machine Learning

Relation Extraction• Extracts semantic relations between entities from text

• Numerous foci: • Supervision: full vs. semi vs. un • Unary relations vs. binary relations vs. n-ary relations • open extraction vs. fixed set extraction • Extraction source: semi-structured data vs. free text • Feature used: only lexical vs. full linguistic features • Extract temporal information

Distant Supervised Relation Extraction• Input: an existing KB + unlabeled text

• For a pair entities participating in a relation in the KB, find them in some sentences in the text.

• Use these sentences as the training data for extracting this relation.

Han Wang RPI

alma mater

• Han Wang, a graduate student in RPI, … • Han received an MSIT degree from RPI. • Han is currently working as a research assistant in RPI. • Han watched a hockey game between RPI and Yale.

???

Learning Features (Sentence Level)• Lexical: words between and around entity mentions and

their POS tags

• Syntactic: dependency parse paths between entity mentions and other words

• Named entity tags

• Conjunction of the above features

Distant Supervision Hypotheses

Han Wang RPI

Han Wang, a graduate student in RPI, …

Han received an MSIT degree from RPI.

Han is currently working as a research assistant in RPI.

Han watched a hockey game between RPI and Yale.

• All sentences containing the entity pair express the relation. [Mintz et al., 2009]

alma mater

“object”“instance” “label”

Single-Instance Single-Label

Distant Supervision Hypotheses• All sentences containing the entity pair express the relation.

[Mintz et al., 2009]

• At least one sentence containing the entity pair expresses the relation. [Riedel et al., 2010]

Han Wang RPI

“object”

alma mater

“instance” “label”

Han Wang, a graduate student in RPI, …



Han is currently working as a research assistant in RPI.

Multiple-Instance Single-Label

Distant Supervision Hypotheses• All sentences containing the entity pair express the relation.

[Mintz et al., 2009]

• At least one sentence containing the entity pair express the relation. [Riedel et al., 2010]

• At least one sentence containing the entity pair expresses the relation, and these these two entities can have multiple relations. [Hoffmann et al., 2011; Surdeanu et al., 2012]

Han Wang RPI

alma materHan Wang, a graduate student in RPI, …



Han is currently working as a research assistant in RPI.employee of

NIL

“object”“instance” “label”

Multiple-Instance Multiple-Label

Relaxing hypotheses Works

[Weston et al., 2013]

Never-Ending Language Learning (NELL) [Carlson et al., 2010]

http://rtw.ml.cmu.edu/rtw/

http://rtw.ml.cmu.edu/rtw/

• Extracts both entities and relations

• Bootstrapping semi-supervised learning • Starts with a small amount of labeled data as seeds • Grows the seeds with high-confidence instances found by the current

model • Suffers from “semantic drift”

• Overcomes the semantic drift by constraining the semi-supervised learning • Mutual exclusion. E.g. a “PhDStudent” cannot also be a “Professor” • Relation argument type checking. E.g. arguments of “employeeOf” should

be “Person” and “Organization” • Unstructured and semi-structured text features. E.g. extractions from free

text and web tables should agree with each other

Never-Ending Language Learning (NELL) [Carlson et al., 2010]

Knowledge Vault [Dong et al., 2014]• Automatically extracts triples from the Web

• Various extraction sources: free text, HTML DOM trees, HTML tables, and human annotations (schema.org, RDFa)

• Uses distant supervision to get triples as the training data. • Uses local closed world assumption (a.k.a. partial completeness

assumption) to assign labels to training data • Extraction results are fused together from different sources with a

confidence score

• Uses prior knowledge (Freebase) to evaluate the extracted triples • Uses Freebase triples to predict the probabilities of the extracted triples

• Combines extraction and prior together

Knowledge Vault [Dong et al., 2014]

Open Information Extraction• Extracts relational tuples without any pre-defined vocabulary

or training data

• Extracts verb phrases as the relations [Fader et al., 2011] • Longest sequences containing the verb and starting with the verbs

• Syntactic constraint: matching POS patterns (e.g. VB, VB DET | PREP, VB NP PREP)

• Lexical constraint: number of distinct argument pairs > threshold

• Extracts relations mediated by nouns, adjectives, and more [Mausam et al., 2012] • Open pattern templates: syntactic patterns + semantic/lexical

constraints

Open Information Extraction

Over 5 billion extractions from over a billion web pages

Revisit: Relations

• A relation can be expressed in many ways (patterns)

• How to establish the “definition” of a relation?

• What makes two relations different?

• How to measure the similarity of two relations?

Han Wang RPI

alma mater

• Han Wang, a graduate student in RPI, … • Han received an MSIT degree from RPI.

PATTY [Nakashole et al., 2012]• A Taxonomy of Relational Patterns with Semantic Types

• Syntactic-Lexical-Ontological (SOL) Pattern Model • Syntactic-Lexical: surface words, wildcards, POS tags • Ontological: semantic type classes as entity placeholders • Support set of pattern: set of entity-pairs for placeholders

• An example: • Pattern: <Person> ’s ADJECTIVE voice * in <Song> • Matching sentences:

• Amy Winehouse’s soul voice in her song ‘Rehab’ • Jim Morrison’s haunting voice and charisma in ‘The End’ • Joan Baez’s angel-like voice in ‘Farewell Angelina’

• Support set: • (Amy Winehouse, Rehab) • (Jim Morrison, The End) • (Joan Baez, Farewell Angelina)

PATTY [Nakashole et al., 2012]• Synonymous patterns (pattern synsets)

• The support sets of two patterns are equivalent • <Person> graduated from <School> =

<Person> obtained degree in * from <School> • and PRONOUN ADJECTIVE advisor <Person> =

under the supervision of <Person>

• Subsumption patterns • The support set of a pattern is the subset of the support set of

another pattern • <Person> nominated for <Award> ⊐

<Person> winner of <Award>

PATTY [Nakashole et al., 2012]

350,000 SOL patterns with 4 million instances

VerbNet [Kipper 2008]• A verb lexicon that groups approximately 5,800 verbs in

over 270 classes

• Inspired by Beth Levin’s classification of verb classes and their syntactic alternations [Levin, 1993] • Members within a single verb class participate in shared types

of alternations because of an underlying shared semantic meaning

• Verbs are classified according to shared syntactic behaviors

VerbNet [Kipper 2008]

VerbNet [Kipper 2008]

What else can be done?• PATTY measures the similarity between relation (patterns) using their

argument (entity) instance sets — an extensional way

• Verbnet defines verbs intensionally via shared syntactic behaviors (semantic meanings).

• Pattern attributes • Surface words • POS tags • Dependency paths • Semantic roles • More deep semantic features (e.g. AMR) • Relation (pattern) embedding

• How about combining them together?

Application: Provenance Modeling

References[1] Niranjan Balasubramanian, Stephen Soderland, and Oren Etzioni. “Rel- grams: a probabilistic model of relations in text”. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web- scale Knowledge Extraction. 2012. url: http://dl.acm.org/citation. cfm?id=2391219.[2] Michele Banko, MJ Cafarella, and Stephen Soderland. “Open Information Extraction from the Web”. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, pp. 2670–2676. url: http: //www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf.[3] Danushka T. Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. “Measuring the similarity between implicit semantic relations from the web”. In: Pro- ceedings of the 18th international conference on World wide web (2009), p. 651. doi: 10.1145/1526709.1526797. url: http://portal.acm.org/ citation.cfm?doid=1526709.1526797.[4] DT Bollegala, Y Matsuo, and M Ishizuka. “Relational duality: Unsuper- vised extraction of semantic relations between entities on the web”. In: Proceedings of the 19th international conference on World wide web. 2010, pp. 151–160. isbn: 9781605587998. url: http://dl.acm.org/citation. cfm?id=1772707.[5] Antoine Bordes and Evgeniy Gabrilovich. “Constructing and mining web- scale knowledge graphs”. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, New York, USA: ACM Press, 2014, pp. 1967–1967. isbn: 9781450329569. doi: 10.1145/2623330.2630803. url: http://dl.acm.org/citation. cfm?doid=2623330.2630803.[6] Andrew Carlson and Justin Betteridge. “Coupled semi-supervised learning for information extraction”. In: Proceedings of the third ACM international conference on Web search and data mining. 2010. isbn: 9781605588896. url: http://dl.acm.org/citation.cfm?id=1718501.[7] Andrew Carlson, Justin Betteridge, and Bryan Kisiel. “Toward an Ar- chitecture for Never-Ending Language Learning.” In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence. 2010. url: http: //www.aaai.org/ocs/index.php/aaai/aaai10/paper/download/ 1879/2201.[8] Omkar Deshpande, DS Lamba, and Michel Tourn. “Building, maintaining, and using knowledge bases: a report from the trenches”. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013. isbn: 9781450320375. url: http://dl.acm.org/citation. cfm?id=2465297.[9] XL Dong et al. “Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion”. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. url: http://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf.[10] Oren Etzioni, Anthony Fader, and Janara Christensen. “Open Informa- tion Extraction: The Second Generation.” In: Proceedings of the Twenty- Second international joint conference on Artificial Intelligence. 2011. url: http://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/ viewFile/3353/3408.[11] Anthony Fader, Stephen Soderland, and Oren Etzioni. “Identifying relations for open information extraction”. In: Proceedings of the 2011 Con- ference on Empirical Methods in Natural Language Processing. 2011. url: http://dl.acm.org/citation.cfm?id=2145596.[12] J. Fan et al. “Automatic knowledge extraction from documents”. In: IBM Journal of Research and Development 56.3 (2012), pp. 1–10. url: http: //ieeexplore.ieee.org/xpls/abs%5C_all.jsp?arnumber=6177723.[13] Adam Grycner and Gerhard Weikum. “HARPY: Hypernyms and Align- ment of Relational Paraphrases”. In: Proceedings of the 25th Interna- tional Conference on Computational Linguistics. 2014, pp. 2195–2204. url: http://pubman.mpdl.mpg.de/pubman/faces/viewItemOverviewPage. jsp?itemId=escidoc:2069526:4.[14] Raphael Hoffmann et al. “Knowledge-Based Weak Supervision for Infor- mation Extraction of Overlapping Relations”. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, pp. 541–550.[15] Charles Kemp, JB Tenenbaum, and TL Griffiths. “Learning systems of concepts with an infinite relational model”. In: Proceedings of the 21st National Conference on Artificial Intelligence. 2006, pp. 381–388. url: http://www.aaai.org/Papers/AAAI/2006/AAAI06-061.pdf.[16] Karin Kipper et al. “A large-scale classification of English verbs”. In: Language Resources and Evaluation 42.1 (Dec. 2007), pp. 21–40. issn: 1574-020X. doi: 10.1007/s10579-007-9048-2. url: http://link. springer.com/10.1007/s10579-007-9048-2.[17] Stanley Kok and Pedro Domingos. “Extracting semantic networks from text via relational clustering”. In: Machine Learning and Knowledge Dis- covery in Databases (2008), pp. 1–16. url: http://link.springer.com/ chapter/10.1007/978-3-540-87479-9%5C_59.[18] Thomas Lin and Oren Etzioni. “Entity linking at web scale”. In: Proceed- ings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. 2012. url: http://dl.acm.org/ citation.cfm?id=2391216.[19] Mike Mintz et al. “Distant supervision for relation extraction without labeled data”. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. August. 2009, pp. 1003–1011. url: http://dl.acm.org/citation.cfm?id=1690287.[20] TP Mohamed, ER Hruschka Jr, and TM Mitchell. “Discovering relations between noun categories”. In: Proceedings of the 2011 Conference on Em- pirical Methods in Natural Language Processing. 2011, pp. 1447–1455. url: http://dl.acm.org/citation.cfm?id=2145586.[21] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. “Discov- ering and exploring relations on the web”. In: Proceedings of the VLDB Endowment 5.12 (Aug. 2012), pp. 1982–1985. issn: 21508097. doi: 10. 14778/2367502.2367553. url: http://dl.acm.org/citation.cfm? doid=2367502.2367553.[22] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. “PATTY: a taxonomy of relational patterns with semantic types”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. July. 2012, pp. 1135–1145. url: http://dl.acm.org/citation.cfm?id=2391076.[23] F Niu et al. “DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference.” In: Proceedings of Very Large Data Search. 2012. url: http://www-cs.stanford.edu/~chrismre/papers/ deepdive%5C_vlds.pdf.[24] Sebastian Riedel, Limin Yao, and A McCallum. “Modeling relations and their mentions without labeled text”. In: Machine Learning and Knowl- edge Discovery in Databases. 2010, pp. 1–16. url: http://link.springer. com/chapter/10.1007/978-3-642-15939-8%5C_10.[25] Sebastian Riedel et al. “Relation extraction with matrix factorization and universal schemas”. In: Proceedings of Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the As- sociation for Computational Linguistics (HLT-NAACL ’13). June. 2013, pp. 74–84. url: http://works.bepress.com/benjamin%5C_marlin/1/.[26] Michael Schmitz et al. “Open language learning for information extraction”. In: Proceedings of the 2012 Joint Conference on Empirical Meth- ods in Natural Language Processing and Computational Natural Language Learning. 2012. url: http://dl.acm.org/citation.cfm?id=2391009.[27] Fabian M Suchanek et al. “Advances in Automated Knowledge Base Con- struction”. In: The Joint Workshop on Automatic Knowledge Base Con- struction and Web-scale Knowledge Extraction. 2012.[28] Fabian Suchanek and Gerhard Weikum. “Knowledge Bases in the Age of Big Data Analytics Turn Web into Knowledge Base”. In: Proceedings of the 40th International Conference on Very Large Data Bases. 2014.[29] Mihai Surdeanu et al. “Multi-instance Multi-label Learning for Relation Extraction”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Lan- guage Learning. July. 2012, pp. 455–465.[30] PD Turney. “Similarity of semantic relations”. In: Computational Linguis- tics February (2006). url: http://www.mitpressjournals.org/doi/ abs/10.1162/coli.2006.32.3.379.[31] Limin Yao, S Riedel, and A McCallum. “Collective cross-document relation extraction without labelled data”. In: Proceedings of the 2010 Con- ference on Empirical Methods in Natural Language Processing. October. 2010, pp. 1013–1023. url: http://dl.acm.org/citation.cfm?id= 1870757.

Thank you!

Questions? Comments? Suggestions?

(Back-up) Dependency Grammar

The dependency relation views the (finite) verb as the structural center of all clause structure. All other syntactic units (e.g. words) are either directly or indirectly dependent on the verb.

Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas.

(Back-up) Semantic Role Labeling

Detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles.

(Back-up) Abstract Meaning Representation

AMR captures “who is doing what to whom” in a sentence. Each sentence is represented as a rooted, directed, acyclic graph with labels on edges (relations) and leaves (concepts).