43
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding Yun-Nung (Vivian) Chen William Yang Wang Anatole Gershman Alexander I. Rudnicky 1 Email: [email protected] Website: http://vivianchen.idv.tw

Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding

Embed Size (px)

Citation preview

PowerPoint Presentation

Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding

Yun-Nung (Vivian) ChenWilliam Yang WangAnatole GershmanAlexander I. Rudnicky1Email: [email protected]: http://vivianchen.idv.tw

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions2

2

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions

3

A Popular Robot - BaymaxBig Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc

4Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people.

4

Spoken Dialogue System (SDS)Spoken dialogue systems are the intelligent agents that are able to help users finish tasks more efficiently via speech interactions.Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).

Apples SiriMicrosofts CortanaAmazons EchoSamsungs SMART TVGoogle Nowhttps://www.apple.com/ios/siri/http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortanahttp://www.xbox.com/en-US/http://www.amazon.com/oc/echo/http://www.samsung.com/us/experience/smart-tv/https://www.google.com/landing/now/

Microsofts XBOX Kinect5

5

Challenges for SDSAn SDS in a new domain requiresA hand-crafted domain ontologyUtterances labeled with semantic representationsAn SLU component for mapping utterances into semantic representationsWith increasing spoken interactions, building domain ontologies and annotating utterances cost a lot so that the data does not scale up.The goal is to enable an SDS to automatically learn this knowledge so that open domain requests can be handled.6

New domain requires 1) hand-crafted ontology 2) slot representations for utterances (labelled info) 3) parser for mapping utterances into; labelling cost is too high; so I want to automate this proces6

Interaction Example

find an inexpensive eating place for taiwanese foodUserIntelligent AgentQ: How does a dialogue system process this request?

Inexpensive Taiwanese eating places include Din Tai Fung, etc. What do you want to choose?

7

How can find a restaurant7

SDS Process Available Domain Ontology

target

food

priceAMODNN

seekingPREP_FOROrganized Domain Knowledge

find an inexpensive eating place for taiwanese foodIntelligent Agent

8User

Domain knowledge representation (graph)8

SDS Process Available Domain Ontology

target

food

priceAMODNN

seekingPREP_FOROrganized Domain Knowledge

find an inexpensive eating place for taiwanese foodIntelligent Agent

9Ontology Induction(semantic slot)User

Domain knowledge representation (graph)9

SDS Process Available Domain Ontology

target

food

priceAMODNN

seekingPREP_FOROrganized Domain Knowledge

find an inexpensive eating place for taiwanese foodUserIntelligent Agent

10Ontology Induction(semantic slot)Structure Learning(inter-slot relation)

Domain knowledge representation (graph)10

SDS Process Spoken language understanding (SLU)

target

food

priceAMODNN

seekingPREP_FOROrganized Domain Knowledge

find an inexpensive eating place for taiwanese foodIntelligent Agent

11seeking=findtarget=eating placeprice=inexpensivefood=taiwanese foodSpoken Language UnderstandingUser

Domain knowledge representation (graph)11

find an inexpensive eating place for taiwanese food

SELECT restaurant { restaurant.price=inexpensive restaurant.food=taiwanese food}

Din Tai FungBoiling Point::12SDS Process Dialogue Management (DM)Intelligent AgentUser

Inexpensive Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose?

Map lexical unit into concepts12

Goals

find an inexpensive eating place for taiwanese foodUser

13

target

food

priceAMODNN

seekingPREP_FORSELECT restaurant { restaurant.price=inexpensive restaurant.food=taiwanese food}Ontology Induction(semantic slot)Structure Learning(inter-slot relation)Spoken Language Understanding

13

GoalsOntology InductionStructure LearningSpoken Language UnderstandingKnowledge AcquisitionSLU Modeling

find an inexpensive eating place for taiwanese foodUser

14

14

Spoken Language UnderstandingInput: user utterancesOutput: the domain-specific semantic concepts included in each utterance15

SLU Modeltarget=restaurantprice=cheapcan I have a cheap restaurantOntology InductionUnlabeled CollectionSemantic KGFrame-Semantic Parsing

FwFsFeature Model

RwRsKnowledge Graph Propagation Model

Word Relation ModelLexical KG

Slot Relation ModelStructure Learning.Semantic KGSLU Modeling by Matrix FactorizationSemantic Representation

15

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions

16

Probabilistic Frame-Semantic ParsingFrameNet [Baker et al., 1998]a linguistically semantic resource, based on the frame-semantics theorywords/phrases can be represented as frames low fat milk milk evokes the food frame; low fat fills the descriptor frame elementSEMAFOR [Das et al., 2014]a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences

Baker et al.,"The berkeley framenet project,"inProc. of International Conference on Computational linguistics, 1998.Das et al.," Frame-semantic parsing,"inProc. of Computational Linguistics, 2014.

17

Frame-Semantic Parsing for Utterances

can i have a cheap restaurantFrame: capabilityFT LU: can FE LU: iFrame: expensivenessFT LU: cheapFrame: locale by useFT/FE LU: restaurant1st Issue: adapting generic frames to domain-specific settings for SDSsGood!Good!?FT: Frame Target; FE: Frame Element; LU: Lexical Unit18

SEMAFOR outputs set of frames; hopefully includes all domain slots; pick some slots from SEMAFOR18

Spoken Language Understanding19Input: user utterancesOutput: the domain-specific semantic concepts included in each utterance

SLU Modeltarget=restaurantprice=cheapcan I have a cheap restaurantOntology InductionUnlabeled CollectionSemantic KGFrame-Semantic Parsing

FwFsFeature Model

RwRsKnowledge Graph Propagation Model

Word Relation ModelLexical KG

Slot Relation ModelStructure Learning.Semantic KGSLU Modeling by Matrix FactorizationSemantic Representation

Y.-N. Chen et al.,"Matrix Factorization with Knowledge Graph Propagationfor Unsupervised Spoken Language Understanding,"in Proc. of ACL-IJCNLP, 2015.

19

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph Propagation(for 1st issue)Spoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions

20

1ST ISSUE: How to adapt generic slots to a domain-specific Setting?Knowledge Graph Propagation ModelAssumption: The domain-specific words/slots have more dependency to each other.

Word Relation Model

Slot Relation Model

word relation matrix

slot relation matrix

1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11

1Test11

Slot InductionRelation matrices allow each node to propagate scores to its neighbors in the knowledge graph, so that domain-specific words/slots have higher scores after matrix multiplication.ilike11capability1

locale_by_use

food

expensiveness

seeking

relational_quantity

desiringUtterance 1i would like a cheap restaurant find a restaurant with chinese foodUtterance 2show me a list of cheap restaurantsTest Utterance 21

21

Knowledge graph Construction22ccompamoddobjnsubjdetSyntactic dependency parsing on utterancescanihaveacheaprestaurantcapabilityexpensivenesslocale_by_use

Word-based lexical knowledge graphSlot-based semantic knowledge graphrestaurantcanhaveiacheapww

capabilitylocale_by_useexpensivenesss

Whats the weight for each edge?22

Knowledge graph Construction23

Word-based lexical knowledge graphSlot-based semantic knowledge graphrestaurantcanhaveiacheapww

capabilitylocale_by_useexpensivenesssThe edge between a node pair is weighted as relation importance to propagate the scores via a relation matrixHow to decide the weights to represent relation importance?

Whats the weight for each edge?23

Weight Measurement by EmbeddingsLevy and Goldberg," Dependency-Based Word Embeddings,"in Proc. of ACL, 2014.24Dependency-based word embeddings

Dependency-based slot embeddings::::canihaveacheaprestaurantccompamoddobjnsubjdethaveacapabilityexpensivenesslocale_by_useccompamoddobjnsubjdet

24

Weight Measurement by Embeddings25

w1

w2

w3

w4

w5

w6

w7

s2

s1

s3Y.-N. Chen et al.,Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding,"in Proc. of NAACL, 2015.

Graph for all utterances25

Knowledge Graph Propagation Model26

Word Relation Model

Slot Relation Model

word relation matrix

slot relation matrix

1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11

1Test11

Slot Induction

26

Feature Model27

Ontology InductionSLU

FwFs

Structure Learning.

1Utterance 1i would like a cheap restaurantWord ObservationSlot CandidateTrain cheaprestaurantfoodexpensiveness1locale_by_use11find a restaurant with chinese foodUtterance 211food11

1Test11

.97

.90

.95

.85

.93

.92

.98

.05

.05

Slot Inductionshow me a list of cheap restaurantsTest Utterance hidden semantics2nd Issue: unobserved hidden semantics may benefit understanding

27

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix Factorization(for 2nd issue)ExperimentsConclusions

28

2nd ISSUE: How to learn implicit semantics?Matrix Factorization (MF)29

Reasoning with Matrix Factorization

Word Relation Model

Slot Relation Model

word relation matrix

slot relation matrix

1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11

1Test11

.97

.90

.95

.85

.93

.92

.98

.05

.05

Slot InductionMF method completes a partially-missing matrix based on a low-rank latent semantics assumption.

29

Matrix Factorization (MF)The decomposed matrices represent low-rank latent semantics for utterances and words/slots respectivelyThe product of two matrices fills the probability of hidden semantics

30

1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11

1Test11

.97

.90

.95

.85

.93

.92

.98

.05

.05

30

Bayesian Personalized Ranking for MF31Model implicit feedbacknot treat unobserved facts as negative samples (true or false)give observed facts higher scores than unobserved facts

Objective:

1

The objective is to learn a set of well-ranked semantic slots per utterance.

31

2nd ISSUE: How to learn implicit semantics?Matrix Factorization (MF)32

Reasoning with Matrix Factorization

Word Relation Model

Slot Relation Model

word relation matrix

slot relation matrix

1Word ObservationSlot CandidateTraincheaprestaurantfoodexpensiveness1locale_by_use1111food11

1Test11

.97

.90

.95

.85

.93

.92

.98

.05

.05

Slot InductionMF method completes a partially-missing matrix based on a low-rank latent semantics assumption.

32

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions

33

Experimental SetupDatasetCambridge University SLU corpus [Henderson, 2012]Restaurant recommendation in an in-car setting in CambridgeWER = 37%vocabulary size = 18682,166 dialogues15,453 utterancesdialogue slot: addr, area, food, name, phone, postcode, price range, task, type

The mapping table between induced and reference slotsHenderson et al.,"Discriminative spoken language understanding using word confusion networks,"inProc. of SLT, 2012.

34

34

Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance35ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8

35

36Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance

ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8ImplicitBaselineRandomMajorityMFFeature ModelFeature Model +Knowledge Graph Propagation

Modeling Implicit Semantics

36

37Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance

ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8ImplicitBaselineRandom3.42.6Majority15.416.4MFFeature Model24.222.6Feature Model +Knowledge Graph Propagation40.5*(+19.1%)52.1* (+34.3%)

Modeling Implicit Semantics

37

ApproachASRManualw/ow/ Explicitw/ow/ ExplicitExplicitSupport Vector Machine32.536.6Multinomial Logistic Regression34.038.8ImplicitBaselineRandom3.422.52.625.1Majority15.432.916.438.4MFFeature Model24.237.6*22.645.3*Feature Model +Knowledge Graph Propagation40.5*(+19.1%)43.5*(+27.9%)52.1* (+34.3%)53.4*(+37.6%)

Modeling Implicit SemanticsThe MF approach effectively models hidden semantics to improve SLU.Adding a knowledge graph propagation model further improves performance.

38Experiment 1: Quality of Semantics EstimationMetric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance

38

All types of relations are useful to infer hidden semantics.Experiment 2: Effectiveness of Relations

39

39

Combining different relations further improves the performance.Experiment 2: Effectiveness of Relations

43All types of relations are useful to infer hidden semantics.

40

OutlineIntroductionOntology Induction: Frame-Semantic ParsingStructure Learning: Knowledge Graph PropagationSpoken Language Understanding (SLU): Matrix FactorizationExperimentsConclusions

41

ConclusionsOntology induction and knowledge graph construction enable systems to automatically acquire open domain knowledge.MF for SLU provides a principle model that is able to unify the automatically acquired knowledgeadapt to a domain-specific settingand then allows systems to consider implicit semantics for better understanding.The work shows the feasibility and the potential of improving generalization, maintenance, efficiency, and scalability of SDSs.The proposed unsupervised SLU achieves 43% of MAP on ASR-transcribed conversations.42

42

Q & AThanks for your attentions!!

43