29
Semantic Space Creation and Associative Search Methods for Document Databases of International Relations *Shiori Sasaki, **Yasushi Kiyoki, ***Taizo Yak ushiji *Graduate School of Media and Governance, Keio University **Faculty of Environmental Information, Keio U niversity ***Faculty of Law, Keio University The 7 th IASTED International Conference on Internet and Multimedia Systems and Applications ~ IMSA 2003 ~ Honolulu, Hawaii, USA August 13 – 15, 2003

* Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

  • Upload
    chun

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

Semantic Space Creation and Associative Search Methods for Document Databases of International Relations. The 7 th IASTED International Conference on Internet and Multimedia Systems and Applications ~ IMSA 2003 ~ Honolulu, Hawaii, USA August 13 – 15, 2003. - PowerPoint PPT Presentation

Citation preview

Page 1: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Semantic Space Creation and Associative Search Methods for Document

Databases of International Relations

*Shiori Sasaki, **Yasushi Kiyoki, ***Taizo Yakushiji*Graduate School of Media and Governance, Keio University

**Faculty of Environmental Information, Keio University***Faculty of Law, Keio University

The 7th IASTED International Conference onInternet and Multimedia Systems and Applications

~ IMSA 2003 ~ Honolulu, Hawaii, USAAugust 13 – 15, 2003

Page 2: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Outline of our study• Study Purposes• Background of the research and our basic idea• Outline of the Semantic Associative Search• Outline of the Space Creation method• Experiment  Experiment 1-1, 1-2, 1-3: verification of the precision of the created space Experiment 2: examination of the feasibility of the created space

• Experimental results and analyses• Conclusion

Page 3: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Study Purposes

1) To realize a semantic space for the searching system which can be applied to the document analysis of International Relations

(a methodological value in the study of International Relations)

2) To realize a mechanism which measures semantic relation between the technical terms and the general words, by integrating the space constructed from source A (lexicon) and the space from source B (dictionary)

(a value for database engineering)

Page 4: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Background of our studyDiscourse analysis of International Relations (IR)

●Content Analysis        ●Cognitive Map e. t. c.

Documents (Discourse) Databases on WWWofficial announcements of the governments, policy statements, parliamentary papers, press briefings, activity reports of NGO, announcements in the form of informal talks of politicians...

Traditional methods forquantitative analysis

researcher

resultsWITHOUT semantic relations

Our basic idea

Quantitative document analysisWITH semantic relationsbetween words and documents

Page 5: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Background of our studyTraditional methods of document analysis

●Content Analysis [1][2]

      A method to analyze the contents of documents: By this method, we can measure the appearance frequency of a word or an encoded sentence in document groups and calculates the correlation level of each word, code and document.

●Cognitive Map [3][4]

     A method to analyze logical routes of cognition of the policy makers: In this method, we consider a logical structure of the document to be a concept-network in the mind of the speaker and calculated the logical main route and the highlight concept.

[1]Ole R. Holsti, “Content Analysis” , 1968.[2]Takashi Inokuchi, Numerical Analysis of International Relations,1970.[3]Robert Axelrod ed., The Structure of Decision : The Cognitive Maps of Political Elites, 1976.[4] Christer Jonsson ed., Cognitive Dynamics and International Politics, 1982.

Weakness: it cannot measure the semantic relations between words.

Weakness: it cannot measure the semantic relations between documents.

EX)‘engagement’ ‘economical contract’? Or ‘military involvement’?‘development’ ‘economic development’? Or ‘development of weapon’?

Page 6: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Our Basic Idea

(1) Basic concept:Integration of semantic spaces of special knowledge and general knowledge

(2) Implementation:Creation of an integrated semantic space from two matrix (specific field matrix and general knowledge matrix)

Specific field

Technical terms

General knowledge

Definition of words

Specific field General knowledge

Integrated knowledge

IntegratedMatrix

Specific field Matrix

f fw

w

・・・

…w

General knowledge Matrix

f f…

w

・・・

f fw

w

・・・

Page 7: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Our Basic idea Knowledge of

a specific field

(International Relations)

General knowledge

Lexicon

General Dictionary

IR basic matrix

Specific field Matrix

f fw

w

・・・

General words matrix

General knowledge Matrix

f f…

w

・・・

Specific field Matrix

fw

・・・

General knowledge Matrix

f…

w

・・・

Tacit knowledge of specialist

Page 8: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Outline ofSemantic Associative Search method

based on Mathematical Model of Meaning (MMM)

Step1   Creation of metadata space (MDS)

Step2   Mapping information resources into

the image space MDS

Step3   Selecting the semantic subspace

Source:Kiyoki, Y., Kitagawa, T. and Hayama, T.:A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, vol. 23, no. 4, pp. 34-41, 1994.

Page 9: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Selection of subspaceaccording to context words given by searchers (e.g. {C ,C })

Step1

M M T

Data Matrix M for creating semantic space

d1d2

f1 f2 fn

dm

Feature words

Basic datum

q1

q2

q v

Kiyoki, Y., Kitagawa, T. and Hayama, T.:A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, vol. 23, no. 4, pp. 34-41, 1994.

I1I2

f1 f2 fn

I j

Feature words

ItemsMedia Data Matrix

C1C2

f1 f2 fn

Cp

Feature words

Contextwords

Context words Matrix

I2I

I7I

I10

I8

I9 I5I3

I1 4

6

Step 2

Step 3

The eigenvalue decomposition of

mapping

Semantic Associative Search methodbased on Mathematical Model of Meaning (MMM)

Page 10: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

(1) The context represented as a set of keywords is given by a user.

(2) A subspace is selected according to the given context. (Context Recognition)

(3) Each media data is mapped onto the subspace, and the norm of the vector(A1’) is calculated as the correlation value between media data and the context.

Semantic subspace

Context Recognition Mechanism in MMM

q 2

dataA1

dataA1‘

context1

the correlation of data A1 for

given context1=(sad,silent)

Page 11: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Outline ofSemantic Associative Search method

Step1   Creation of metadata space (MDS)

Step2   Mapping information resources into

the image space MDS

Step3   Selecting the semantic subspace

Page 12: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Outline of the Space Creation methodProcess to create a data matrix for space creation

Step1 . Creation of an IR basic matrix by using a lexicon of the terms [1] (①)

Step2 . Creation of a general words matrix by using a general dictionary [2] (②)

Step3 . Integration of the IR basic matrix and the general words matrix

  3-1. Defining the technical terms by the general words (③)  3-2. Relating the general words to the technical terms (④)

[1] Evans, Graham and Newnham, Jeffrey : Dictionary of International Relations, Penguin Books, 1998. [2] Longman Dictionary of Contemporary English, Longman, 1987.

Page 13: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Structure of the data matrix for space creation

Related feature words Defining feature wordsfr-1fr-2    … fr-m fd-1fd-2 … fd-n

Technical wIR-1 Step 3-1:

basic terms wIR-2 Definition of technical termsIR basic matrix by general wordsIR-Mr IR-Md

① ③wIR-k

General wG-1

basic words wG-2

general words matrixG-M r G-M d

④ ②

wG-l

reference definition

Step 3-2: Relation between general words and technical terms

……

Page 14: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Extraction of vertical elements and horizontal elements of IR basic matrix

Related feature words

fr-1fr-2    … fr-m

Technical wIR-1

basic termswIR-2

IR basic matrix

IR-Mr

wIR-k…

1) Extraction of items in index of IR lexicon as

‘technical basic terms’ (vertical elements)

 

ABC WeaponsABMAccidental warAccommodationACPAct of warAction-reactionActorAdjudicationAdministered territoryAfghanistanAgent-structureAggressionAidAIDSAir powerAlienAlliance :

2) Extraction of related terms in explanation of items as ‘related feature words’ (horizontal elements)

Accommodation: Term much beloved of crisis management theorists and practitioners of negotiational diplomacy. It refers to the process whereby actors in conflict agree to recognize some of the others’ claims while not sacrificing their basic interests. The source of conflict is ...

Page 15: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Structure of the data matrix for space creation

Related feature words Defining feature wordsfr-1fr-2    … fr-m fd-1fd-2 … fd-n

Technical wIR-1 Step 3-1:

basic terms wIR-2 Definition of technical termsIR basic matrix by general wordsIR-Mr IR-Md

① ③wIR-k

General wG-1

basic words wG-2general words matrix

G-M r G-M d

④ ②

wG-l

reference definition

Step 3-2: Relation between general words and technical terms

……

Page 16: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Example

ABC WeaponsAccidental warAccommodationActorArms Control : : :Zero-SumZionismZone of peace

abilityableaboutarms :control :forcepowerremove :threat :warweapon :

AB

C W

eapo

nsA

ccid

enta

l war

Acc

omm

odat

ion

Act

orA

rms

Con

trol

: : Zer

o-S

umZ

ioni

smZ

one

of p

eace

abili

tyab

leab

out

arm

s : co

ntro

l : fo

rce

pow

erre

mov

e : th

reat

: war

wea

pon

:

1 1 0 1 1 0 0 ... 1 0 1 1 0 0 1 0 1 0 -1 0 1 0 -1 0 1 1 ...

1 1 0 0 1 0 0 ... 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 ...

0 0 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0..

1 : positive relation-1 : negative relation0 : no relation

IR basic matrix

General words matrix

Page 17: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Realization of Semantic Space for International Relations

• IR basic matrix (①) →  IR space    712 basic words, 712 feature words     710 dimensional vectors

• General words matrix (②) →  General words space

    2115 basic words, 2115 feature words  

• Integrated matrix (①②③④) →  Integrated space 

    2115+712 basic words , 2861 feature words     2846 dimensional vectors

Page 18: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiments to verify the precision and feasibility of the created space

Experiment 1

Comparison of correlations of words between IR space and the Integrated space

Experiment 1-1 : results in the Integrated space

Experiment 1-2 : comparison IR space with the Integrated space

Experiment 1-3 : comparison IR space with the Integrated space

Experiment 2

Document retrieval in the Integrated space

Page 19: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiment1-1 Purpose of the experiment: To verify that it is possible to retrieve IR technical terms by keywords of general words.Evaluation method: We selected the general words ‘trade,’ ‘environment’ and ‘human’ as keywords of queries and checked each result of the top 10 in the integrated space.Experimental results: The IR technical terms related to the keywords of general words are ranked in the top 10.

Table: The retrieval results in the integrated space

keyword: trade environment humanrank retrieved word correlation retrieved word correlation retrieved word correlation

1 GATT 0.46484 organization 0.447723 human 0.5010312 free trade area 0.458932 environment 0.445309 ethnic cleansing 0.48023 protectionism 0.447085 pollution 0.403463 genocide 0.4302774 tariff 0.43101 green movements 0.347933 Genocide Convention 0.4151715 Tokyo round 0.407401 ecology/ ecopolitics 0.308913 immigration 0.3672036 trade 0.389198 INGO 0.30539 international law 0.358667 free trade 0.380862 north/ south 0.302781 nationalism 0.3522278 common market 0.379662 globalization 0.255842 nation 0.3291269 quota 0.363634 NATO 0.244127 ethnic nationalism 0.328147

10 Kennedy round 0.359655 Earth 0.239894 balkanization 0.325926

Page 20: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiment 1-2

Purpose of the experiment:  To verify that the integrated space realizes high quality retrieval for IR terms.Evaluation method:  We selected an IR term ‘arms control’ as a keyword for a query and retrieved correlated words in IR space and the integrated space. Then we compared the top 30 retrieval results. 

Page 21: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiment 1-2

Experimental results: For query ‘arms control,’ the words such as ‘weapon,’ ‘INF treaty,’ ‘START II’, ‘START I,’ ‘weapons of mass destruction,’ and ‘cruise missile’ are included in the top 30 in the integrated space, which are closer to ‘arms control’ in semantic relation. At the same time, the words such as ‘tactical nuclear weapons’ and ‘CTBT,’ which are closer to ‘arms control’ in semantics, are selected in the high ranking in the integrated space.

IR space Integrated spacerank retrieved word correlation retrieved word correlation

1 NPT 0.422383 weapon 0.5970432 arms control 0.386465 NPT 0.4115233 nuclear proliferation 0.367211 nuclear weapons 0.4011194 inspection 0.354512 nuclear proliferation 0.3707735 non-proliferation 0.345976 INF treaty 0.354186 proliferation 0.344098 non-proliferation 0.3462417 nuclear weapons 0.32252 tactical nuclear weapons 0.3325968 preventive war 0.315053 START II 0.3312099 second strike 0.30641 CTBT 0.321286

10 non offensive defence 0.305555 Cuban missile crisis 0.31230211 MAD 0.30432 proliferation 0.30887712 deterrence 0.303463 START I 0.30552513 CTBT 0.303237 SALT 0.29727214 Cuban missile crisis 0.29737 weapons of mass destruction 0.29442915 accidental war 0.290768 cruise missile 0.29431116 limited nuclear war 0.288547 second strike 0.28502517 parity 0.287821 massive retaliation 0.2767218 force 0.283384 arms control 0.26860119 realism 0.279878 missile 0.26842720 verification 0.277056 deterrence 0.266044~ ~ ~ ~ ~29 tactical nuclear weapons 0.261701 flexible response 0.24260130 chemical and biological war 0.260292 horizontal proliferation 0.23927

Page 22: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Purpose of the experiment:

   To verify that the integrated space keeps the retrieval quality without breaking the fundamental structure of the IR space.

Evaluation method:

• We selected 15 IR technical terms such as ‘weapons of mass destruction,’ ‘economic liberalism’ and ‘non-tariff barriers’ as keywords from each issue area of IR; security, political economy, international organization, human rights, global environmental problem, theoretical perspective and concept.

• For these keywords, we retrieved the words in the IR space similarly in Experiment 1-2, and fixed the top 10 words as correct answers.

• In the integrated space, we measured the ratio of the correct answers ranked in the top 10 and top 20.

Experiment1-3

Page 23: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiment1-3

0%

20%

40%

60%

80%

100%

120%

10件中の再現率 20件中の再現率Experimental results: For more than a half of queries, the rate of the correct answers included in the top 10 retrieval results in the integrated space was 70%. For all queries, the correct answers were included in the top 20 at the high rate of 80% to 100%.

Page 24: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiment 2: document retrieval in the integrated space(from keywords to the document with metadata)

Purpose of the experiment:  To examine the feasibility of the Semantic Associative Search using the created space. Evaluation method:We collected 40 documents concerning IR from WWW as retrieval candidates and prepared three kinds of metadata set for documents; 1) only IR technical terms, 2) only general words, 3) both IR terms and general terms.We also classified type of keyword for queries into three kinds; 1) by only IR terms, 2) by only general term, 3) by both IR term and general term.

Keywords for query

IR terms General wordsIR terms +general words

Metadata IR terms Ⅰ Ⅱ Ⅲof document General words Ⅳ Ⅴ Ⅵ

IR terms +general words Ⅶ Ⅷ Ⅸ

Table: Combinations of metadata and keyword for queries

Page 25: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Evaluation method: We selected the IR term ‘conflict,’ ‘crisis’ and the general word ‘attack,’ ‘crash’ as keywords for queries and fixed eight documents about the security as correct answers in advance. Then put them ID’s as doc5, doc10, doc15, doc 20, doc25, doc30, doc35 and doc40.

Experiment 2: document retrieval in the integrated space(from keywords to the document with metadata)

ID matadata of documentdoc1 trade system, free trade, tariff, regime, import, product, trade1, standard1, success…doc2 aid, north-south, LDCs, forth-world, poor, people1, aids, die1…

~ ~doc15 human rights, war, intervention, communal conflict, prisoner, race, soldier, rights, attack ...doc16 epistemic communities, futurology, ...global governance, future, government, theory, idea…

~ ~doc39 water, natural resources, world-politics, environment , stop, dirty, air, water, protect, earth…doc40 armistice, war, conflict, demilitalization, intelligenge, stop, attack, fight, information …

Examples of metadata: pattern3:: both IR terms and general terms

Page 26: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experiment 2 : document retrieval in the integrated space

Experimental results・ The case in which the document with metadata of both IR terms and general words were retrieved by the keywords of both IR terms and general word show the best retrieval quality (IX).・ Even if the documents were given only IR-terms-metadata or the documents were given only general-word-metadata (III), they were marked relatively high relevance ratio by the keyword set of both IR terms and general words (VI).・ The case in which the document only with IR-term-metadata were retrieved by the keyword of general word (II) and the case in which the document only with general-word-metadata were retrieved by the keywords of IR (IV), the relevance ratio was not so high but at least 5 documents were ranked in the top 10.

keyword: keyword: keyword:

IR terms General words IR terms + General words

rank ID correlation ID correlation ID correlation1 doc40 0.502226 doc5 0.501295 doc5 0.5443522 doc5 0.492371 doc40 0.392284 doc40 0.5352343 doc15 0.431048 doc15 0.355318 doc15 0.459114 doc20 0.337915 doc20 0.284476 doc20 0.343914

metadata: 5 doc25 0.268766 doc37 0.202732 doc25 0.256491IR terms 6 doc35 0.246094 doc11 0.201073 doc11 0.242029

7 doc11 0.241361 doc9 0.199328 doc35 0.2361158 doc2 0.233853 doc17 0.199207 doc2 0.2353249 doc30 0.231785 doc23 0.195791 doc30 0.233203

10 doc33 0.220733 doc25 0.193684 doc33 0.199303

1 doc40 0.510731 doc10 0.512244 doc40 0.5528742 doc30 0.424205 doc20 0.446176 doc30 0.458413 doc10 0.259057 doc40 0.433106 doc10 0.323584 doc20 0.24301 doc30 0.364944 doc20 0.295688

metadata: 5 doc24 0.222061 doc14 0.284089 doc24 0.218491General words 6 doc35 0.198955 doc5 0.277443 doc35 0.204756

7 doc5 0.188389 doc8 0.26781 doc5 0.1995598 doc26 0.181802 doc25 0.264337 doc25 0.1786929 doc32 0.181352 doc23 0.249578 doc26 0.169355

10 doc11 0.179863 doc15 0.24773 doc6 0.16754

1 doc5 0.472904 doc5 0.493925 doc5 0.524782 doc40 0.468055 doc20 0.394756 doc40 0.4983843 doc15 0.390095 doc40 0.378979 doc15 0.4162984 doc30 0.342379 doc15 0.340018 doc30 0.3652

metadata: 5 doc20 0.311306 doc10 0.322031 doc20 0.341837IR terms 6 doc25 0.264962 doc30 0.299353 doc25 0.258615

+ 7 doc35 0.255268 doc14 0.271628 doc35 0.250516General words 8 doc33 0.222828 doc17 0.254037 doc10 0.23775

9 doc38 0.219296 doc23 0.245184 doc2 0.20946710 doc10 0.21916 doc8 0.236924 doc24 0.207355

Page 27: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Experimental results and analyses

• Experiment 1 From the results, it was clarified that the characterization

including both defining and relating is made more appropriately in the integrated space than in the single IR space.

It was also clarified that not only the IR terms but also the general words which reflect the knowledge of IR field could be retrieved by both general words and IR technical terms.

• Experiment 2 Form the results, it is clarified that the documents which consist

of IR terms can be retrieved by using general words, and the ones which consist of general words can be searched by using IR term in the integrated space.

Page 28: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Conclusion

• We have presented the creation method of semantic retrieval space for document analysis of International Relations and clarified the precision and feasibility of the new created space.

• According to this method, we can realize Semantic Associative Search and analysis of documents dynamically according to our concern or viewpoint and also realize computations of the semantic relations between words and documents as an amount of the correlation.

• This space creation method can be applied to other specific fields if only a lexicon of the field and a general dictionary as Longman exist.

Page 29: * Shiori Sasaki , ** Yasushi Kiyoki , *** Taizo Yakushiji

Future works

• We will develop this integrated semantic space for the mechanism to treat the time-series document data so as to analyze the changes of cognition, attitudes and values of various actors---country, region, organization, government, ethnic group and individual.

• We will apply this semantic space to data-mining of web contents.