Upload
willa-hunt
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Never-Ending Language Learning for Vietnamese
Student: Phạm Xuân Khoái
Instructor: PhD Lê Hồng Phương
Coupled SEAL
Main content
1• Introduction
2•Concepts
3•How it do
1. Introduction
• SEAL (Set Expander for Any Language) is a set expansions system that accepts input elements (seeds) of some target set S and automatically finds other probable elements of S in semi-structured documents such as web pages.
• CSEAL (Coupled SEAL) is a SEAL systems which is added 2 constrants:
• mutual-exclusion
• type-checking constraints
1. Introduction
• Coupled SEAL : A semi-structured extractor
• SEAL: use wrapper induction algorithm
• Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances
• Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables
• 5 queries/category 10 queries/relation fetches 50 web pages/query
• Rank by probabilities assigned as in CPL
1. Introduction
Beliefs CSEAL New candidate
facts
Internet
1. Introduction
3
2 1
Beliefs
Candidate facts
Knowledge Integrator
CPL RLCMCCSEAL
Data Resources
Knowledge Base
Subsystem Components
Example
2. Concepts
• Seed: input element
• Wrapper: defined by 2 character strings, which specify the left-context and right-context necessary for an entity to be extracted from a page. These strings are chosen by 2 conditions:
• Maximally-long contexts
• At least 1 occurrence of every seed strings on a page
Example
3. How it do
3. How it do
3. How it do
References
• Toward an Architecture for Never-Ending Language Learning (http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf)
• Language-Independent Set Expansion of Named Entities using the Web (http://www.cs.cmu.edu/~wcohen/postscript/icdm-2007.pdf)
• Coupled Semi-Supervised Learning for Information Extraction (http://www.cs.cmu.edu/~rcwang/papers/wsdm-2010.pdf)
• Character-level Analysis of Semi-Structured Documents for Set Expansion (https://www.cs.cmu.edu/~rcwang/papers/emnlp-2009.pdf)