A New Paradigm for Alignment Extraction

A New Paradigmfor Alignment Extraction

Christian Meilicke & Heiner StuckenschmidtUniversity Mannheim

Research Group Data and Web Science

Ontology Matching (for sure not complete)

Analyse labels

• Normalize and split labels attached to concepts and properties• Aggregate token specific results to derive similarities for labels

Generate Candidates

• Interprete label similarities as confidence scores of mapping hypotheses

RefineCandidates

• Use the structure of the ontologies to refine confidence scores (e.g. similarity flooding) of the hypotheses

Select FinalAlignment

• Apply threshold to select the final alignment from the hypotheses• Use logical reasoning to filter out correspondences resulting in incoherencies

Example

Token: 2:Reviewedt

Label: 2:ReviewedContributionEntity: 2#ReviewedContribution

ExampleSimilarities between Tokens

equivt(1:Documentt, 2:Documentt), 1.0equivt(1:Contributiont, 2:Contributiont), 1.0equivt(1:Documentt, 2:Documentt), 1.0equivt(1:Contributiont, 2:Contributiont), 1.0equivt(1:Reviewedt, 2:Reviewedt), 1.0equivt(1:Acceptedt, 2:Acceptedt), 1.0equivt(1:Contributiont, 2:Papert), 0.1

Alignment Candidates

map(1#Document, 2#Document), 1.0map(1#AcceptedContribution, 2#AcceptedContribution), 1.0map(1#AcceptedContribution, 2#AcceptedPaper), 0.55map(1#ReviewedContribution, 2#ReviewedPaper), 0.55map(1#Contribution, 2#Paper), 0.1

Average similarity of involved token

Alignment Candidates

map(1#Document, 2#Document), 1.0map(1#AcceptedContribution, 2#AcceptedContribution), 1.0map(1#AcceptedContribution, 2#AcceptedPaper), 0.55map(1#ReviewedContribution, 2#ReviewedPaper), 0.55map(1#Contribution, 2#Paper), 0.1

Example

Generated Alignment

map(1#Document, 2#Document), 1.0map(1#AcceptedContribution, 2#AcceptedContribution), 1.0map(1#ReviewedContribution, 2#ReviewedPaper), 0.55

threshold > 0.5 & greedy 1:1

ExampleGenerated Alignment

map(1#Document, 2#Document), 1.0map(1#AcceptedContribution, 2#AcceptedContribution), 1.0map(1#ReviewedContribution, 2#ReviewedPaper), 0.55

map(1#AcceptedContribution, 2#AcceptedContribution)map(1#ReviewedContribution, 2#ReviewedPaper)

map(1#ReviewedContribution, 2#ReviewedPaper)map(1#Contribution, 2#Paper)

Strange ...

Proposed Approach• Generate hypotheses about both• Mappings between ontological entities• Equivalence assumptions about linguistic entities

• Define joint optimization problem (with the help of Markov Logic) where linguistic equivalence assumtions and mappings between ontological entities are kept consistent, i.e., such mappings are not allowed

map(1#ReviewedContribution, 2#ReviewedPaper)map(1#Contribution, 2#Paper){

Markov Logic (simplified)

• Probabilistic formalism to attach weights (=> probabilities) to first order formulas• Given a set of weighted formulas and a set of hard formulas, the MAP state is the

most probable subset of the weighted formulas• Satisfies hard formulas• Maximizes weights attached to soft formulas

• Due to the underlying log linear model, the MAP State S is the subset that is optimal with respect to the sum of the weights of those formula that are true in S• Can be transformed to ILP (Integer Linear Program), RockIt uses this approach to

compute the MAP state efficiently

Three types of entites• Linguistic entities• Tokens: 2:Acceptedt, 2:Rejectedt, 2:Contributiont

• Labels: 2:AcceptedContribution

• (Onto) Logical entities (concepts, roles, attributes):• 2#AcceptedContribution

• A label can consist of several tokens• A logical entity can have several labels• Or from one label several labels can be generated

Logical Entities

Labels

Tokens

tities

Token equivalences as weighted atoms• Specify weights between -1.0 and 0.0, the higher the more likely it is

that two tokens are equivalent• Example:

equivt(1:Documentt, 2:Documentt), 0.0equivt(1:Contributiont, 2:Contributiont), 0.0equivt(1:Documentt, 2:Documentt), 0.0equivt(1:Contributiont, 2:Contributiont), 0.0equivt(1:Reviewedt, 2:Reviewedt), 0.0equivt(1:Acceptedt, 2:Acceptedt), 0.0equivt(1:Contributiont, 2:Papert), -0.9

From Tokens to Labels (hard formulas• Use hard formulas to describe which tokens occur in which labels at

which position• Example:• has2Token(2:AcceptedContribution)• pos1(2:AcceptedContribution, 2:Acceptedt)• pos2(2:AcceptedContribution, 2:Contributiont)

Label Token

From Labels to Logical Entities• Use hard formulas to make explicit which labels are used to described which

entities• Example:

• hasLabel(2#AcceptedContribution, 2:AcceptedContribution)

• Several labels might be given or generated within preprocessing step• E.g. if domain restriction is used as part of the original label, all a reduced label

• hasLabel(2#writesPaper, 2:writesPaper) // original• hasLabel(2#writesPaper, 2:writes) // added

• E.g. remove of and reverse order of tokens• hasLabel(2#AuthorOfPaper, 2:AuthorOfPaper) // original• hasLabel(2#AuthorOfPaper, 2:PaperAuthor) // added

Logical Entity Label

Main rules I / II• Iff logical entities are matched, they need to have (some) equivalent

labels• map(e1 , e2) l∃ 1 l∃ 2 (hasLabel(e1, l1)

hasLabel(e∧ 2, l2) equiv(l∧ 1, l2))

• Iff labels are equivalent, all of their tokens have to be equivalent (needs to be specified for all types of labels)• has2Token(l1) has2Token(l∧ 2) pos1(l∧ 1, t11 ) ∧pos2(l1, t12 ) pos1(l∧ 2 , t21) pos2(l∧ 2, t22 ) → (equiv(l1, l2) equiv(t11, t21) equiv(t∧ 12, t22))

Main rules II / II• 1:1 rules for tokens• equivt(t1,t2) & equivt(t1,t3) => t2 = t3

• Positive reward for generated mappings (soft constraint)• 0.5 map(e1, e2)

Added for each instantiation

Example

Is this outcome consistent with our rule set?

map(1#ReviewedContribution, 2#ReviewedPaper)map(1#Contribution, 2#Paper)

No, it is not!

ExampleWhat will be the outcome of the optimization problem?

Matching n tokens on n+1 tokens• Rule set too strict, such a mappings as the following one can never be

generated• equiv(1:ConferencePaper, 2:Paper)

• Allow to match 2-token labels on 1-token labels iff the headnoun of the 2-token label is ignored

• Ignoring a word results in a penalty, add this• -0.9 ignore(t)

• and add weaken the previously mentioned rules ba adding a disjunct• „two token needs to be matched on two token OR on 1 token if headnoun is ignored

Example• Ontology 1 uses these concepts

• 1#ConferencePaper• 1#ConferenceFee• 1#ConferenceParticipant

• Only Black• Do not ignore 1:Conferencet as modifier, no mappings possible, score = 0.0• Ignore 1:Conferencet: 0.0 - 0.9 + 1 x 0.5 = -0.4

• Grey and Black• Do not ignore 1:Conferencet as modifier, no mappings possible, score = 0.0• Ignore 1:Conferencet: 0.0 + 0.0 + 0.0 - 0.9 + 3 x 0.5 = 0.6

• Ontology 2 uses these concepts• 2#Paper• 2#Fee• 2#Participant

Integrating logical reasoning• By adding the rule set used by CODI (for example) the coherence of the

generated alignment can be ensured*

• E.g.: map(e1,e2) & map(d1, d2) & sub(e1, d1) => !dis(e2, d2)

• This can have an impact on the equivalences on the linguistic layer, which can have again an impact on parts of the mapping that were not directly affected by the logical constraint!

* ... not correct: many logical conflicts are taken into account, however, the rules set is not complete!

Some more adjustments ...• Generate multiple labels out of one• E.g. if range of 1#writesPaper is 1#Paper, assume that 1:writesPaper and 1:writes are labels of 1#writesPaper• Add for 1#AuthorOfPaper also the label 1#PaperAuthor

• Allow to match 3-token on 2-token labels with some penalty, if all of the tokens from the 2-token label match

• Only match properties that have a domain and range if their domain and range are matched

Experimental Setup• Applied to OAEI conference track

• Why not to the others?• Problem with exponential runtime, will not terminate for ontologies with more than 1000

logical entities (... depends also on some other factors)

• Applicable to some of the benchmarks, however, due to their automated generation, tokens that appear as parts of labels are not replaced by synonym (are not supressed)

• MAMBA@OAEI 2015 = this approach• However, lots of room for improvement when going from experimental prototype to

robust matching system• Sorry, for the painful installation that some OAEI organizers had to experience

Similarity Input

0.0 -0.1

[-0.2, 0.0]

Results

Conclusions• Proposed a new method for lexical ontology matching, but is it a new

paradigm?• Good results (given the fact that the input similarity is rather weak)• Achieves „consistent“ results

• Consistent, w.r.t. underlying assumptions that are relevant• Behaves (sometimes) like a human• Is in a certain way very simple• Is very hard to use in practice• Uses a bunch of parameters• Horrible runtimes for larger problems (exponential)

• At least, it is worth thinking about

Thank you foryour attention

A New Paradigm for Alignment Extraction

Data & Analytics

Paradigm Precision Paradigm Precision Proprietary Information

Terminology Extraction and Term Ranking for Standar … · Terminology Extraction and Term Ranking for Standar dizing Term ... the alignment case study consisted of around

A new paradigm for Continuous Alignment of Business and IT

Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao

Copyright 2002, Paradigm Publishing Inc. CHAPTER 3 BACKNEXTEND 3-1 LINKS TO OBJECTIVES Nonprinting Characters Nonprinting Characters Alignment and Spacing

Beyond the Current Paradigm in Management Thought: Alignment with

Chapter 6 View Alignment Techniques and Method Customization (Part III) Object-Oriented Technology From Diagram to Code with Visual Paradigm for UML Curtis

Pairwise Alignment Global & local alignment

Current developments in machine translation · – either as a new ‘paradigm’ (complete break with the past) ... • preparatory stage for alignment of SL and TL sentences in

Model Extraction Warning in MLaaS Paradigm · 2017-11-21 · Model Extraction Warning in MLaaS Paradigm Manish Kesarwani 1, Bhaskar Mukhoty2, Vijay Arya and Sameep Mehta1 1IBM India

PARADIGM AND PARADIGM SHIFTS

Seq. Alignment, Struc. Alignment, Threading

The Servant James C. Hunter. Old Paradigm vs. New Paradigm Old Paradigm Old Paradigm New Paradigm New Paradigm US Invincibility Centralized Management

Collocation Extraction Using Monolingual Word Alignment Method

THE SHAWNEE ALIGNMENT SYSTEM: APPLYING PARADIGM …

Chapter 6 View Alignment Techniques and Method Customization (Part II) Object-Oriented Technology From Diagram to Code with Visual Paradigm for UML Curtis

LEARNING NON-VERBAL RELATIONS UNDER OPEN ......LEARNING NON-VERBAL RELATIONS UNDER OPEN INFORMATION EXTRACTION PARADIGM CLARISSA CASTELLÃ XAVIER To Letícia and Rodrigo for all their

Ontology Alignment. Ontology alignment Ontology alignment Ontology alignment strategies Evaluation of ontology alignment strategies Ontology alignment

Chapter 6 View Alignment Techniques and Method Customization (Part I) Object-Oriented Technology From Diagram to Code with Visual Paradigm for UML Curtis

Big Data Analytics in Science and Research: New Drivers ... Johnson.pdf · The Fourth Paradigm, the Internet of Things, Automated Data Extraction Methods, and Big Data Analytics –