Upload
yun-nung-vivian-chen
View
118
Download
0
Embed Size (px)
Citation preview
PowerPoint Presentation
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding
Yun-Nung (Vivian) ChenWilliam Yang WangAnatole GershmanAlexander I. Rudnicky1Email: [email protected]: http://vivianchen.idv.tw
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions2
2
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions
3
A Popular Robot - BaymaxBig Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc
4Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people.
4
Spoken Dialogue System (SDS)Spoken dialogue systems are the intelligent agents that are able to help users finish tasks more efficiently via speech interactions.Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).
Apples SiriMicrosofts CortanaAmazons EchoSamsungs SMART TVGoogle Nowhttps://www.apple.com/ios/siri/http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortanahttp://www.xbox.com/en-US/http://www.amazon.com/oc/echo/http://www.samsung.com/us/experience/smart-tv/https://www.google.com/landing/now/
Microsofts XBOX Kinect5
5
Challenges for SDSAn SDS in a new domain requiresA hand-crafted domain ontologyUtterances labeled with semantic representationsAn SLU component for mapping utterances into semantic representationsWith increasing spoken interactions, building domain ontologies and annotating utterances cost a lot so that the data does not scale up.The goal is to enable an SDS to automatically learn this knowledge so that open domain requests can be handled.6
New domain requires 1) hand-crafted ontology 2) slot representations for utterances (labelled info) 3) parser for mapping utterances into; labelling cost is too high; so I want to automate this proces6
Interaction Example
find an inexpensive eating place for taiwanese foodUserIntelligent AgentQ: How does a dialogue system process this request?
Inexpensive Taiwanese eating places include Din Tai Fung, etc. What do you want to choose?
7
How can find a restaurant7
SDS Process Available Domain Ontology
target
food
priceAMODNN
seekingPREP_FOROrganized Domain Knowledge
find an inexpensive eating place for taiwanese foodIntelligent Agent
8User
Domain knowledge representation (graph)8
SDS Process Available Domain Ontology
target
food
priceAMODNN
seekingPREP_FOROrganized Domain Knowledge
find an inexpensive eating place for taiwanese foodIntelligent Agent
9Ontology Induction(semantic slot)User
Domain knowledge representation (graph)9
SDS Process Available Domain Ontology
target
food
priceAMODNN
seekingPREP_FOROrganized Domain Knowledge
find an inexpensive eating place for taiwanese foodUserIntelligent Agent
10Ontology Induction(semantic slot)Structure Learning(inter-slot relation)
Domain knowledge representation (graph)10
SDS Process Spoken language understanding (SLU)
target
food
priceAMODNN
seekingPREP_FOROrganized Domain Knowledge
find an inexpensive eating place for taiwanese foodIntelligent Agent
11seeking=findtarget=eating placeprice=inexpensivefood=taiwanese foodSpoken Language UnderstandingUser
Domain knowledge representation (graph)11
find an inexpensive eating place for taiwanese food
SELECT restaurant { restaurant.price=inexpensive restaurant.food=taiwanese food}
Din Tai FungBoiling Point::12SDS Process Dialogue Management (DM)Intelligent AgentUser
Inexpensive Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose?
Map lexical unit into concepts12
Goals
find an inexpensive eating place for taiwanese foodUser
13
target
food
priceAMODNN
seekingPREP_FORSELECT restaurant { restaurant.price=inexpensive restaurant.food=taiwanese food}Ontology Induction(semantic slot)Structure Learning(inter-slot relation)Spoken Language Understanding
13
GoalsOntology InductionStructure LearningSpoken Language UnderstandingKnowledge AcquisitionSLU Modeling
find an inexpensive eating place for taiwanese foodUser
14
14
Spoken Language UnderstandingInput: user utterancesOutput: the domain-specific semantic concepts included in each utterance15
SLU Modeltarget=restaurantprice=cheapcan I have a cheap restaurantOntology InductionUnlabeled CollectionSemantic KGFrame-Semantic Parsing
FwFsFeature Model
RwRsKnowledge Graph Propagation Model
Word Relation ModelLexical KG
Slot Relation ModelStructure Learning.Semantic KGSLU Modeling by Matrix FactorizationSemantic Representation
15
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions
16
Probabilistic Frame-Semantic ParsingFrameNet [Baker et al., 1998]a linguistically semantic resource, based on the frame-semantics theorywords/phrases can be represented as frames low fat milk milk evokes the food frame; low fat fills the descriptor frame elementSEMAFOR [Das et al., 2014]a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences
Baker et al.,"The berkeley framenet project,"inProc. of International Conference on Computational linguistics, 1998.Das et al.," Frame-semantic parsing,"inProc. of Computational Linguistics, 2014.
17
Frame-Semantic Parsing for Utterances
can i have a cheap restaurantFrame: capabilityFT LU: can FE LU: iFrame: expensivenessFT LU: cheapFrame: locale by useFT/FE LU: restaurant1st Issue: adapting generic frames to domain-specific settings for SDSsGood!Good!?FT: Frame Target; FE: Frame Element; LU: Lexical Unit18
SEMAFOR outputs set of frames; hopefully includes all domain slots; pick some slots from SEMAFOR18
Spoken Language Understanding19Input: user utterancesOutput: the domain-specific semantic concepts included in each utterance
SLU Modeltarget=restaurantprice=cheapcan I have a cheap restaurantOntology InductionUnlabeled CollectionSemantic KGFrame-Semantic Parsing
FwFsFeature Model
RwRsKnowledge Graph Propagation Model
Word Relation ModelLexical KG
Slot Relation ModelStructure Learning.Semantic KGSLU Modeling by Matrix FactorizationSemantic Representation
Y.-N. Chen et al.,"Matrix Factorization with Knowledge Graph Propagationfor Unsupervised Spoken Language Understanding,"in Proc. of ACL-IJCNLP, 2015.
19
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph Propagation(for 1st issue)Spoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions
20
1ST ISSUE: How to adapt generic slots to a domain-specific Setting?Knowledge Graph Propagation ModelAssumption: The domain-specific words/slots have more dependency to each other.
Word Relation Model
Slot Relation Model
word relation matrix
slot relation matrix
1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11
1Test11
Slot InductionRelation matrices allow each node to propagate scores to its neighbors in the knowledge graph, so that domain-specific words/slots have higher scores after matrix multiplication.ilike11capability1
locale_by_use
food
expensiveness
seeking
relational_quantity
desiringUtterance 1i would like a cheap restaurant find a restaurant with chinese foodUtterance 2show me a list of cheap restaurantsTest Utterance 21
21
Knowledge graph Construction22ccompamoddobjnsubjdetSyntactic dependency parsing on utterancescanihaveacheaprestaurantcapabilityexpensivenesslocale_by_use
Word-based lexical knowledge graphSlot-based semantic knowledge graphrestaurantcanhaveiacheapww
capabilitylocale_by_useexpensivenesss
Whats the weight for each edge?22
Knowledge graph Construction23
Word-based lexical knowledge graphSlot-based semantic knowledge graphrestaurantcanhaveiacheapww
capabilitylocale_by_useexpensivenesssThe edge between a node pair is weighted as relation importance to propagate the scores via a relation matrixHow to decide the weights to represent relation importance?
Whats the weight for each edge?23
Weight Measurement by EmbeddingsLevy and Goldberg," Dependency-Based Word Embeddings,"in Proc. of ACL, 2014.24Dependency-based word embeddings
Dependency-based slot embeddings::::canihaveacheaprestaurantccompamoddobjnsubjdethaveacapabilityexpensivenesslocale_by_useccompamoddobjnsubjdet
24
Weight Measurement by Embeddings25
w1
w2
w3
w4
w5
w6
w7
s2
s1
s3Y.-N. Chen et al.,Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding,"in Proc. of NAACL, 2015.
Graph for all utterances25
Knowledge Graph Propagation Model26
Word Relation Model
Slot Relation Model
word relation matrix
slot relation matrix
1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11
1Test11
Slot Induction
26
Feature Model27
Ontology InductionSLU
FwFs
Structure Learning.
1Utterance 1i would like a cheap restaurantWord ObservationSlot CandidateTrain cheaprestaurantfoodexpensiveness1locale_by_use11find a restaurant with chinese foodUtterance 211food11
1Test11
.97
.90
.95
.85
.93
.92
.98
.05
.05
Slot Inductionshow me a list of cheap restaurantsTest Utterance hidden semantics2nd Issue: unobserved hidden semantics may benefit understanding
27
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix Factorization(for 2nd issue)ExperimentsConclusions
28
2nd ISSUE: How to learn implicit semantics?Matrix Factorization (MF)29
Reasoning with Matrix Factorization
Word Relation Model
Slot Relation Model
word relation matrix
slot relation matrix
1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11
1Test11
.97
.90
.95
.85
.93
.92
.98
.05
.05
Slot InductionMF method completes a partially-missing matrix based on a low-rank latent semantics assumption.
29
Matrix Factorization (MF)The decomposed matrices represent low-rank latent semantics for utterances and words/slots respectivelyThe product of two matrices fills the probability of hidden semantics
30
1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11
1Test11
.97
.90
.95
.85
.93
.92
.98
.05
.05
30
Bayesian Personalized Ranking for MF31Model implicit feedbacknot treat unobserved facts as negative samples (true or false)give observed facts higher scores than unobserved facts
Objective:
1
The objective is to learn a set of well-ranked semantic slots per utterance.
31
2nd ISSUE: How to learn implicit semantics?Matrix Factorization (MF)32
Reasoning with Matrix Factorization
Word Relation Model
Slot Relation Model
word relation matrix
slot relation matrix
1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11
1Test11
.97
.90
.95
.85
.93
.92
.98
.05
.05
Slot InductionMF method completes a partially-missing matrix based on a low-rank latent semantics assumption.
32
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions
33
Experimental SetupDatasetCambridge University SLU corpus [Henderson, 2012]Restaurant recommendation in an in-car setting in CambridgeWER = 37%vocabulary size = 18682,166 dialogues15,453 utterancesdialogue slot: addr, area, food, name, phone, postcode, price range, task, type
The mapping table between induced and reference slotsHenderson et al.,"Discriminative spoken language understanding using word confusion networks,"inProc. of SLT, 2012.
34
34
Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance35ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8
35
36Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance
ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8ImplicitBaselineRandomMajorityMFFeature ModelFeature Model +Knowledge Graph Propagation
Modeling Implicit Semantics
36
37Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance
ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8ImplicitBaselineRandom3.42.6Majority15.416.4MFFeature Model24.222.6Feature Model +Knowledge Graph Propagation40.5*(+19.1%)52.1* (+34.3%)
Modeling Implicit Semantics
37
ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8ImplicitBaselineRandom3.422.52.625.1Majority15.432.916.438.4MFFeature Model24.237.6*22.645.3*Feature Model +Knowledge Graph Propagation40.5*(+19.1%)43.5*(+27.9%)52.1* (+34.3%)53.4*(+37.6%)
Modeling Implicit SemanticsThe MF approach effectively models hidden semantics to improve SLU.Adding a knowledge graph propagation model further improves performance.
38Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance
38
All types of relations are useful to infer hidden semantics.Experiment 2: Effectiveness of Relations
39
39
Combining different relations further improves the performance.Experiment 2: Effectiveness of Relations
43All types of relations are useful to infer hidden semantics.
40
OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions
41
ConclusionsOntology induction and knowledge graph construction enable systems to automatically acquire open domain knowledge.MF for SLU provides a principle model that is able to unify the automatically acquired knowledgeadapt to a domain-specific settingand then allows systems to consider implicit semantics for better understanding.The work shows the feasibility and the potential of improving generalization, maintenance, efficiency, and scalability of SDSs.The proposed unsupervised SLU achieves 43% of MAP on ASR-transcribed conversations.42
42
Q & AThanks for your attentions!!
43