“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettinger, Karlsruhe Institute of Technology

KIT – Karlsruhe Institute of Technology

INSTITUTE OF APPLIED INFORMATICS ANDFORMAL DESCRIPTION METHODS (AIFB)

www.kit.edu

Towards Multi-Step Expert Advice for Cognitive ComputingAchim Rettinger ([email protected])

Cognitive Systems Institute Speaker Series, October/13/2016

Institute of Applied Informatics andFormal Description Methods

2

My Research Group

Media ChannelAnalytics

HealthcareAnalytics

KIT• Former University

of Karlsruhe, Germany

• 24.800 students• 9.500 employees

AIFB• Research Group

Web Science andKnowledge Managment

• Prof. Studer andProf. Sure-Vetter

KSRI• Industry-on-

campus model• Prof. Satzger


3

Our Research

Cross-Lingual Technologies

Cross-Modal Technologies

Language A Language B

DiCaprioappeared inTitanic

DiCapriospielt inTitanic

(Mogadala et al. 2015)

70 A. Mogadala and A. Rettinger

labels. Some approaches formulate an optimization problem [12] where corre-lation between modalities is found by separating the classes in their respectivefeature spaces. As cross-modal data involves heterogeneous features, most of theapproaches [14] aim in learning these features implicitly without any externalrepresentation. Zhai [13] focus on joint representation of multiple media typesusing joint representation learning which incorporates sparse and graph regu-larization. We use KCCA for maximizing pair-wise correlation between differentmedia as Blaschko [15] used for correlational spectral clustering.

3 Approach

In this section, we formulate our research question formally and present ourapproach.

3.1 Problem Formulation

As discussed in the previous section, multi-modal documents on the web arefound in the form of pair-wise modalities. Sometimes, there can be multipleinstances of modalities present in the documents. To reduce the complexity,we assume a multi-modal document Di = (Text,Media) to contain a singlemedia item either an image, video or audio embedded with a text description. Acollection Cj = {D1, D2...Di...Dn} of these documents in different languagesL ={LC1, LC2 ...LCj ...LCm} are spread across web. Formally, our research question isto find a cross-modal semantically similar document across language collectionsLCo using unsupervised similarity measures on low-dimension correlation spacerepresentation. Figure 2 shows broad visualization of the approach.

Fig. 2. Correlated Space Retrieval(Zhang et al. 2014)


4

Our Research

Semantic Search Entity Summarization

Fig. 1. Automatically annotated excerpt of a Wikipedia article9and the summaClientknowledge panel with a summary by LinkSUM.

that can be enabled at the top of each page. Other proprietary solutions includethe Bing Knowledge Widget6 and Ontotext’s Now7. Most of the proprietarysolutions are highly customized and the annotation and knowledge panel partsare often strongly connected.

4 Summary

With ELES, we propose loose coupling between automatic entity linking and en-tity summarization systems via ITS 2.0. We exemplify the lightweight integrationapproach with the applications DBpedia Spotlight and the qSUM method of theSUMMA entity summarization interface.

Acknowledgement. The research leading to these results has received fund-ing from the European Union Seventh Framework Programme (FP7/2007-2013)under grant agreement no. 611346 and by the German Federal Ministry of Ed-ucation and Research (BMBF) within the Software Campus project “SumOn”(grant no. 01IS12051).

6 Bing Knowledge Widget – https://www.bing.com/widget/knowledge

7 Ontotext Now – http://now.ontotext.com/

9https://en.wikipedia.org/w/index.php?title=Angela_Merkel&oldid=

709980123

Filter for MultipleEntities

Constant Stream

(Zhang et at. 2016) (Thalhammer et al. 2016)


5

Our Innovation Projects

LiMexLiMe – crossLingual crossMedia knowledge extraction

http://xlime.eu

Augment with related content from news and social media

Semantic Search across content in channels

Supported by


6

“Watson Seminar” supported by IBM Academic Initiative

Our Teaching

Institut für Angewandte Informatik und Formale Beschreibungsverfahren

5 07/07/2016

Our Task

▪ Create a system that identifies the relationship between two randomly given characters

Expectations to final solution

▪ The book series Game of Thrones has about 170 major characters and most of them are somehow related

Starting point

[1] http://www.maa.org/sites/default/files/pdf/Mathhorizons/NetworkofThrones%20%281%29.pdf

[1]


7

TOWARDS MULTI-STEP EXPERT ADVICE FOR COGNITIVE COMPUTING

Joint work with Patrick Philipp


8

Many tasks comprise multiple steps …

Step 1 Step 2 Step n…


9

Medical Assistance

Brain Stripping

Brain Registration

RobustBrain

Normalization

Normal Brain

Normalization

Tumor Segmentation

MapGeneration

Tumor Prediction

Tumor Progression Mapping

(Philipp et al. 2015)


10

Natural Language Processing

Named Entity Recognition

Named Entity Linking

Entity Disambiguation

Web

ofD

ocum

ents

Web

ofT

hing

s


11

Multiple “experts“ might be available …

Step 1 Step 2 Step n…

Expert 1

Expert 2

Expert m

Expert 1

Expert 2

Expert m

Expert 1

Expert 2

Expert m

… … …


12

Natural Language Processing

Named Entity Recognition

Named Entity Linking

Entity Disambiguation - Example

FOX

Stanford Tagger

X-LISA POS Rules

…

AGDISTIS

AIDA

X-LISA Disambiguator

…


13

Develop robust approaches given various data distributionsNLP: News articles, social media, blogs, …Medical Assistance: Patients of different departments, scans taken with different machines by different people

à Many Machine Learning techniques oversimplify as they assume data to be independent and identically distributed (i.i.d.)

Multiple interpretation steps render brute force approaches impractical

Number of possible alternatives grow fast over multiple stepsPotential (continuous-) parameters have to be set

Different kinds of additional constraints might be setExecution / query budgets: Not all experts can be askedTime budgets: A solution has to be found in a predefined time frame

à Learn behavior of experts with as few training samples as possible and transfer knowledge among different training datasets

Various Challenges


14

Natural Language ProcessingCan be applied to natural language processing tasksE.g. named entity recognition and –disambiguation pipeline

Hypothesis generation and evaluationScore outputs of expertsAdapt weight over time

Dynamic learningLearn weights for each expert given a specific contextAdapt expert choices given a specific contextIncrementally improves with experience

Connection toIBM Watson‘s Cognitive Computing Capabilities


15

(Budgeted-) Decision Making with Expert Advice (Cesa-Bianchi et al. 1997, Amin et al. 2015)

Adversarial (non i.i.d.) setting with potential budgetsBest expert / subset of experts need to be found

(Contextual-) Bandits (e.g. Auer et al. 2002)Approaches for adversarial and i.i.d. settings availableOnly one action can be played, no feedback for the restA high-dimensional context might be given to generalize

(Contextual-) Markov Decision Processes (Puterman 1996, Krishnamurthy et al. 2016 ) for Reinforcement Learning

Multi-stage contextual bandit with different context spacesOnly intractable solutions with good theoretical performance guarantees exist

Connection to Decision Making Theory


16

Problem Formalization –Entity Disambiguation Example

! "!!Michael Jordan

basketball

$!!

$%!

! "!!$!%

$%%

! "!!! "!!

! "!!Michael JordanàNE

basketballàNE

Michael JordanàNE

basketballà

NIL

! "!!Michael Jordanà

dbpedia:Michael_J

ordan

basketballà

NIL

+1

Michael JordanàNE

basketballà

NIL

basketballà

NIL

Michael Jordanà

dbpedia:Michael_J

ordan


17

Probabilistic Soft Logic (PSL)

PSL (Kimmig et al. 2012) is a template language to instantiate a Hinge Loss Markov Random Field (HL-MRF) (Bach et al. 2012)

0.3: *+,$-. /, 1 ∧ 345$"64+ 1, 7 ≫ 345$"64+(/, 7),0.8: "<4="$ /, 1 ∧ 345$"64+ 1, 7 ≫ 345$"64+(/, 7)

Given such PSL rules and observations (data), we can infer the unknown truth values (atoms)

Our Idea: Certain sequences of experts perform better on certain decision candidates

Introduce a set of PSL rules that describes the dependencies betweenexperts and decision candidates in a specific state

Collect observations of executions of the pipeline

Probabilistic inference will give you the weights telling you how toexecute experts in each state


18

PSL Rules for Multi-Step Learning

>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

! B!?


19


>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

! B!?

Hypothesis / Locality / Weight / Value


20


>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

% B!?

Hypothesis / Locality / Weight / Value

C!.!: D4EFG,5H >, B => K$,Lℎ5(>, B)

C1.2:K$,Lℎ5(>, B!) ∧ PH<45ℎ$"," >, B!, B% => QFG=$(B%)


21


>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

! B!?

Independence


22


>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

! B!?

Independence / Combination

C2: R-.$<$-.$-5 >!, >%, B => K$,Lℎ5(>!, B)


23


>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

! B!?

Robustness / Future Reward


24


>!?@!

>%?@!

>!?

>%?

>A?

! B!?@!

% B!?@!

! B!?

Robustness / Future Reward

C3: S4T="5 >!, >%, B => K$,Lℎ5(>!, B)


25

Task: Named Entity Recognition + Named Entity Disambiguation(Entity Linking) for tweets and news articles

Scenario 1 (individual steps): Predict the performance on NER andNED of experts for

Tweets, left out from training setArticles, trained on tweets only

Scenario 2 (full pipeline): Given a process for collecting samples (e,s) (i.e. expert performance on tweet or article), select best outcomes toimprove overall performance

Empirical Evaluation


26

1. NER

1. NED

2.

Preliminary Results


27

Heuristic similarity measures such as text length or number of extra characters yield good results

The relational learning approach (PSL) seems to allow for knowledge transfer but further evaluations are needed

PSL scales well for thousands of tweets and articles if meta-dependencies are precomputed

Lessons learnt


28

PSL approach beats State-of-the-Art for heterogeneous textual data

Our approach needs to be embedded into contextual bandit / reinforcement learning techniques. No exploration / exploitation strategy implemented so far.

Conclusion & Future Work


29

(Amin et at. 2015)

(Auer et al. 2002)

(Krishnamurthy et al. 2016)

(Puterman 1994)

(Bach et al. 2012)

(Kimmig et al. 2012)

Amin, K., Kale, S., Tesauro, G., and Turaga, D. S. (2015).Budgeted prediction with expert advice. In AAAI, pages2490–2496.Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E.(2002). The nonstochastic multiarmed bandit problem.SIAM J. Comput., 32(1):48–77.Krishnamurthy, A., Agarwal, A., and Langford, J. (2016).Contextual-mdps for pac-reinforcement learning with richobservations. CoRR, abs/1602.02722.Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. WileyInterscience, New York.Bach, S. H., Broecheler, M., Getoor, L., and O’Leary, D. P.(2012). Scaling MPE inference for constrained continuousmarkov random fields with consensus optimization. InNIPS, pages 2663–2671.Kimmig, A., Bach, S., Broecheler, M., Huang, B., andGetoor, L. (2012). A short introduction to probabilistic softlogic. In NIPS Workshop on Probabilistic Programming:Foundations and Applications, pages 1–4.

References


30

(Zhang et al. 2016)

(Thalhammer et al. 2016)

(Philipp et al. 2015)

(Mogadala et al. 2015)

(Zhang et al. 2014)

Lei Zhang, Michael Färber, Achim Rettinger; XKnowSearch! Exploiting Knowledge Bases for Entity-based Cross-lingual Information Retrieval; The 25th ACM International on Conference on Information and Knowledge Management (CIKM), ACM, Oktober, 2016

Andreas Thalhammer, Nelia Lasierra, Achim Rettinger; LinkSUM: Using Link Analysis to Summarize Entity Data; In Bozzon, Alessandro and Cudré-Mauroux, Philippe and Pautasso, Cesare, Web Engineering, 16th International Conference, ICWE 2016, Lugano, Switzerland, June 6-9, 2016. Proceedings, Seiten: 244-261, Springer International Publishing, LectureNotes in Computer Science, 9671, Cham, Juni, 2016

Patrick Philipp, Maria Maleshkova, Darko Katic, Christian Weber, Michael Goetz, AchimRettinger, Stefanie Speidel, Benedikt Kämpgen, Marco Nolden, Anna-Laura Wekerle, Rüdiger Dillmann, Hannes Kenngott, Beat Müller, Rudi Studer; Toward Cognitive Pipelines of Medical Assistance Algorithms; International Journal of Computer Assisted Radiology and Surgery, November, 2015

Aditya Mogadala, Achim Rettinger; Multi-Modal Correlated Centroid Space for Multi-LingualCross-Modal Retrieval; In Hanbury, Allan and Kazai, Gabriella and Rauber, Andreas and Fuhr, Norbert, Advances in Information Retrieval: 37th European Conference on IR Research(ECIR), Vienna, Austria., Seiten: http://people.aifb.kit.edu/amo/ecir2015/, SpringerInternational Publishing, Cham, Germany, April, 2015

Lei Zhang, Achim Rettinger; X-LiSA: Cross-lingual Semantic Annotation; Proceedings of the VLDB Endowment (PVLDB), the 40th International Conference on Very Large Data Bases(VLDB), 7, (13), Seiten 1693-1696, September, 2014

Own Publications


31

[email protected]://www.aifb.kit.edu/web/Achim_Rettinger/en

concerningResearch DiscussionsInnovation Ideas

aboutExpert ProcessesCross-Lingual TechnologiesCross-Modal TechnologiesSemantic SearchEntity Summarization

Thank you & feel free to contact me