On-Demand Information Extractionmavir2006.mavir.net/docs/Sekine-OnDemandIE.pdfCoreStates Financial Corp. agreed to buy Meridian Bancorp for $3.1 billion in stock, creating a bank with

On-Demand Information Extraction

New York UniversitySatoshi Sekine

Summer/Fall 07

Introduction(http://nlp.cs.nyu.edu/sekine)

Research topics

Attributefor ENE

EnglishAnalyzer(OAK)

Web PeopleSearch

JapaneseSpelling

Multi/Singdoc. sum.

IE andSum.

Cross Ling.Pre-ODIE

ParaphraseDiscoveryIE pattern

Discovery

EncyclopediaQA

IR, clustering& sum.

Cross Ling.QA

2nd grade language test

Query log& class

Informationneeds

On-demand IE

QA

IRSummarization

RelationDiscovery

ExtendedNE

What is Information Extraction

• To automatically extract information on specific scenario from unstructured text and put it into table format

• Ex. Management succession Newspaper

PresidentBank of Manhattan

Bill Brown7/2/2003

COESmith Trade corporation

John Smith7/1/2003PositionCompanypersonDate

Problem in conventional IE

• Preparation of knowledge for given scenario– Pattern, lexicon and semantic knowledge

• Company announced Person’s promotion to Position

– Writing rules by hand or creating training data• Restriction

– Preparation time (one month in MUC)– Specific/Limited scenario/domain

Our goal• Make one month to one minute (=automatic)• IE on the fly• How?

– Use unsupervised learning methods• Pattern discovery, paraphrase discovery

– Prepare as much knowledge/tools as we can for as many scenarios as possible

• Extended NE, semantic knowledge, relation

• Demo http://blueberry.cs.nyu.edu:8080/odie

http://blueberry.cs.nyu.edu:8080/odie

ODIE – Flow ChartDescription of task

Table

IR systemRelevantDocument

Pattern discoveryLanguage Analyzer

ENE taggerPattern

Pattern sets

Table construction

Paraphrase discoveryCo-reference

Automatic Discovery of IE patterns (Sudo et.al HLT01) (Sudo et.al ACL03)

• Bottleneck of IE is knowledge creation• IE pattern

Company announced Person’s promotion to Position

• Key Idea– Retrieve documents on the given topic– Collect relatively frequent patterns in the retrieved documents

• Research focus and Result– Pattern format (predicate-argument, chain, sub-tree)– Computation time (Tree mining algorithms)– Near human performance was achieved

IE Pattern Discovery - ResultSuccessionTrain: 1 yr. newspaper

Test: 148 (87 relevant, 61 irrelevant)

Slots:

Person:135

Org: 172

Pot: 215

Automatic acquisition of paraphrase(Sekine Gengo01) （ Shinyama et al HLT02) (Shinyama et al IWP03)

• Purpose: Relationship between IE patterns• Key idea

– Events are usually reported in different newspapers on the same day

– However, NE and numeric, date expressions are identical

– Use them as anchor to acquire paraphrase• We observed encouraging results, but we found that

it is not so easy a task

Paraphrase Acquisition

Procedure

NewspaperA

NewspaperB

ArticleA

ArticleB

FindComparable

Articles

Anchorsidentified

Anchorsidentified

NE taggingCoref.

FindComparableSentences

phraseA

ExtractParaphrase

phraseB

Paraphrase Acquisition

EvaluationParaphrase Acquisition

withw/ow/owithw/o

Coref

withwithw/o

ASD

56%69%(52)75

56%(18)3262%(23)37

24%(25)106Phrases(>100)

44%75%(41)55Sentence(93)

recallprecisionobtained

•Two techniques (co-reference, Argument Structure Database)•One year of two Japanese newspaper•195 article pairs on arrest of murder suspects

Relation discovery(Hasegawa et al ACL04)

• Motivation– Discovering particular relations between NE’s– Country & President, Merged Companies

• Unsupervised method– Not known the relationships in advance– As many relationships as possible

• IdeaContext based clustering– Pairs of entities with similar context would be in the

same relation– Similar contexts could also characterize the relations

Relation Discovery

Procedure• Tag ENE on text, select two ENE types• Collecting expressions within 5 words, and make the

context between them as context vector• Cluster ENE pairs based on context vectors• Label each cluster based on frequent context words

is offering to buy’s interest inis negotiating to acquire’s planned purchase of

Company A Company B…

Accumulated contexts

5 words or lessNamed entity Named entity

Relation Discovery

Result

parent (1.0), unit (0.476), own (0.143), …7/7Parentacquire (1.0), acquisition (0.583), buy (0.583), …9/9M&Abuy (1.0), bid (0.382), …10/11M&Acoach (1.0), …5/5CoachRep. (1.0), Republican (0.667), …5/6RepublicanSecretary (1.0), secretary (0.143), …6/7SecretaryGov. (1.0), governor (0.458), Governor (0.3), …15/16GovernorMinister (1.0), minister (0.875), Prime (0.875), …

15/16Prime MinisterSen. (1.0), Republican (0.214), …19/21SenatorPresident (1.0), president (0.415), …17/23PresidentCommon words (Relative frequency)RatioMajor relations

Relation Discovery

Evaluation and Error analysis• Evaluation result

Labeling accuracy: 10/10 = 100% (Cluster level) 108/121 = 89% (NE pair level)

(cf. recall: 140/(177+65) = 58% (All clusters))

• False Alarm (Mis-clustered NE pairs): – Common words in another cluster which existed in 5

words contexts by accident (noise words)… Chechnya war may exhaust President Boris Yeltsin …

• Miss (Undetected NE pairs): – Absence of common words in 5 words contexts– Outer context might be helpful… Boris Yeltsin on end the fighting in Chechnya …

Relation Discovery

Another Paraphrase Discovery via Relation Discovery (Hasegawa et.al Gnengo-05)

• Expressions found in the same relation should contain paraphrase

• Two filters– Used in more than one pair of instance– Expressions contain frequent termEx) A bought B / A has agreed to buy B / A,

which is buying B / A’s proposed acquisition of B / A’s acquisition of B / A’s agreement to buy B / A’s purchase of B / A bid for B / A’s takeover of B / A merger with B / A succeeded in buying B / B, which was acquired by A / B would become a subsidiary of A / B agreed to be bought by A

Yet Another Paraphrase Discovery using context and keywords

(Sekine IPW-05)• To eliminate the frequency threshold for the

vector model in Hasegawa’s method• Algorithm

– Find keywords in context of each pair of NEs using TFIDF

– Cluster phrases based on NE instances (buy – acquire / buy – agree / buy – purchase)

Extended Named Entity（ Sekine et.al LREC02) (Sekine et.al LREC04)

• Named Entity with 200 categories• Definition (150 page) was released• Automatic tagger

– 130K entries– 2000 rules– 70% accuracy– Improving…

Attributes for ENE (ver. 7.0.1)(example for PERSON)

　5(11)Car accident, GuillotineCause of death

Person5(11)John B. Kelly, Sr.Father

Location5(11)New York, BirminghamPlace of birth

Game6(13)World Series, 1955 piano competition in ParisCompetition

Title 6(13)Knight, an honorary degree at YaleTitle

Person8(17)Santa, father ChristmasAnother name

Person8(17)Saint NicholasReal Name

Award8(17)Academy Award, MVP, Nobel PrizeAward

Era8(17)Edo period, the 11th centuryEra

date10(22)04,23,1704, 04/23/1704, unknownDeath date

Person10(22)Andrea del Verrocchio, Michelangelo di Lodovic Buonarroti SimoniMentor

Location12(26)England, New YorkPrevious stay

Province18(39)State of Illinois, SichuanNative Province

City19(41)Paris, Manchester, ShanghaiHometown

School20(44)M.A. in German at Cambridge, MK High SchoolGraduate

Product, Facility25(54)Guernica, Mona LisaMasterpiece

Vocation　26(57)A professor at Yale University, The Princess of WalesCareer

Country29(63)American, Chinese, JapaneseNationality

Vocation46(100)professional baseball player, economist, poetVocation

ENEFreq. (%)Example of valueAttribute(20)

../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html../../../??????/version7_0_0draft003_2006-12-14.html

ODIE – Flow ChartDescription of task

Table

IR systemRelevantDocument

Pattern discoveryLanguage Analyzer

ENE taggerPattern

Pattern sets

Table construction

Paraphrase discoveryCo-reference

On-demand Information Extraction- ODIE -

（ Middle-size NSF project at NYU 03-08)• Demo http://blueberry.cs.nyu.edu:8080/odie• Description of the task (topic)

– acquire acquisition merge merger buy purchase• Build a table in 1 minute

Last year$400 millionChemicals Inc., nyt951113.0483

$1.27 billionTenneco Inc., Mobil Corp.nyt951002.0099

$3.1 billionCoreStates Financial Corp., Merdian Bancorp

nyt951010.0389

Last week$900 millionComdata Holdings Corp., Ceridian Corp.

nyt950420.0056

DateMoneyCompanyDocid

http://blueberry.cs.nyu.edu:8080/odie

• nyt950831.0485 The move comes amid a flurry of acquisitions in the computer financial processing industry. Last week, check processor Ceridian Corp. agreed to buy Comdata Holdings Corp. for $900 million.

• nyt951113.0483 Last year, Terra Industries Inc., another large fertilizer maker, bought Agricultural Minerals & Chemicals Inc. for $400 million. ``Industry growth is primarily overseas,'' said Harvey Stober, an analyst with Donaldson, Lufkin & Jenrette in New York.

• nyt951002.0099 Tenneco Inc. said it agreed to buy Mobil Corp.'s plastics division for $1.27 billion as part of its strategy to expand into less cyclical, higher-growth businesses.

• nyt951010.0389CoreStates Financial Corp. agreed to buy Meridian Bancorp for $3.1 billion in stock, creating a bank with a commanding role in Philadelphia and the industrial cities of eastern Pennsylvania.

ODIE - Evaluation Result

• Out of 20 topics, tables created for 2 topics are judged useful, 12 are useful, 6 are not useful.

• Correctness of the table fillers

Number of rowsEvaluation

12Incorrect4Partially correct84Correct

Research topics

Attributefor ENE


Web PeopleSearch

JapaneseSpelling

Multi/Singdoc. sum.

IE andSum.

Cross Ling.Pre-ODIE


Discovery

EncyclopediaQA


Cross Ling.QA


Query log& class

Informationneeds

On-demand IE

QA

IRSummarization

RelationDiscovery

ExtendedNE

Preemptive Information Extraction(Shinyama, Sekine NAACL06)

• We can even delete queries…

1. Find a pair (or triple) of entities that have a similar relation in articles.

ex. Relation between “Katrina”-“New Orleans” in Article A is similar to relation between “Longwang”-“Taiwan” in Article B.

2. Cluster relations based on their similarity.3. A cluster of similar relations = table.

Knowledge Discovery from Query Log(work at Microsoft Research, Summer 06)

• Query log contains a lot of knowledge• Discover knowledge about ENE…

bathroom+showers+picturesmiss+mundoowen+mulligan+tyronenews+pressfree+wedding+plannermarc+arther+glenpreadolescente+follandogokmenler+agricultural+machinery+co.+-erotika+comacademy+award+winner

100+greatest+britons AWARD100+worst+britons AWARDaaass/orbis+books+prize+for+polish+studies AWARDabel+prize AWARDacademy+award AWARDacademy+awards AWARDacm+turing+award AWARDagatha+award AWARDagatha+awards AWARDair+medal AWARD

Query log (6month, 1.5B) ENE dictionary (245K)

Problems to be solved

Problem 2This is noise in

ENEdictionary. I

want toget rid of them!!!

Problem 1This is a very

general context. I want to see

important context for this

category

Problem 4Individual

entities are verbose. I want to cluster them.

Problem 3The list may not

have enough coverage. I want to

populate ENE dictionary.

Important ContextNoise

ClusteringPopulate the list

Key Idea• Find Important context for each category

– New Formula (not TFIDF, MI, log likelihood ratio …) Score(context) = ftype{context} * log( g(context) / C );

g(context) = ftype{context}/Finst{context};C = ftype{contexttop1000} / Finst{contexttop1000};

– Result• ACADEMIC #+syllabus, degrees+in+#, phd+in+#, masters+in+#, diploma+in+#, #+lecture+notes, masters+degree+in+#, courses+in+,• AWARD

#+winners, #+nominees, #+nominations, #+winner, #+award, who+won+#, winners+of+#, list+of+#+winners, winners+of+the+#

Results• Delete Noise (Problem 2)

– Entities which does not co-occur with important context are noise

– BOOK:it, porno, space, night, we, working, candy, wheels, foundation, jazz,

ghost, couples, the bible, giant, daddy, creation …

• Find New entities (Problem 3)– words co-occur with important context are entities– AWARD:

golden+globes, grammys, golden+globe, kentucky+derby, daytime+emmy, sag, sag+awards, american+idol, daytime+emmys

Web People Searcha SemEval 2007 task

• InputFirst 100 results for a person name search

• OutputClustring according to the actual people

• John Smith 1 (Captain)– Captain John Smith – www.apva.org– John Smith wikipedia – en.eikipedia..– …

• John Smith 2 (Labour leader)– BBC: Labour leader John Smith – news…– John Smith wikipedia – en.wikipedia…

• John Smith 3 (IBM researcher)• John Smith 4 (Film director)• John Smith 5 (Shoe company)• John Smith 6 (Writer)• …

http://www.apva.org/

WePS• SemEval - formally called “Senseval” since 1998

– 27 proposal / 18 final tasks– >100 teams, > 125 systems, 110 papers

• WePS is the largest single task– 16 participants (38 people in ML)– http://nlp.uned.es/weps

• Data

98.9345.93Total av.71.2010.76Total av.

99.1050.30Census47.205.90WEB03*

98.4031.00ACL0699.2015.30ECDL06

99.3056.50Wikipedia99.0023.14Wikipedia

Av. doc.Av. entitysourceAv. doc.Av. entitysource

TestTraining

http://nlp.uned.es/weps

WePS (Result)team F α=0,5 purity inv_purity Presentation/Poster

CU_COMSEM 0.79 0.72 0.88 Tomorrow - PS2 - 11:15–12:30COMBINED_BASELINE 0.78 0.64 1.00IRST-BP 0.77 0.75 0.80 Tomorrow - PS2 – 11:15–12:30PSNUS 0.77 0.73 0.82 Next presentation - 17:00–17:15UVA 0.69 0.81 0.60 Tomorrow - PS3 – 14:30–15:45SHEF 0.66 0.60 0.82FICO 0.67 0.53 0.90 Tomorrow - PS2 – 11:15–12:30UNN 0.66 0.60 0.73 Tomorrow - PS3 – 14:30–15:45SCATTERED_BASELINE 0.64 1.00 0.47AUG 0.64 0.50 0.88 Tomorrow - PS2 – 11:15–12:30SWAT-IV 0.62 0.55 0.71 -UA-ZSA 0.61 0.58 0.64 Tomorrow - PS3 – 14:30–15:45TITPI 0.60 0.45 0.89 Tomorrow - PS3 – 14:30–15:45JHU1-13 0.58 0.45 0.82 Tomorrow - PS2 – 11:15–12:30DFKI2 0.53 0.39 0.83 Tomorrow - PS2 – 11:15–12:30WIT 0.52 0.36 0.93 Tomorrow - PS3 – 14:30–15:45UC3M_13 0.51 0.35 0.95 Tomorrow - PS3 – 14:30–15:45UBC-AS 0.45 0.30 0.91 -JOINED_BASELINE 0.45 0.29 1.00

WePS

• Systems– Basic strategies are similar HTML analsys => pre-processing (POS, NE…) =>

feature selection (token, NE…) => calculate similarities => make clusters => produce output

– Summary is available• Future

– Second event (08 fall or 09 spring)– Attribute extraction (DOB, vocation…)

Human Relationship Extraction(Goto NHK)

• Extract human relationships from movie/TV summary

• Display it as a network in a graph

• NE for the domain• Attribute for NE• Clustering the

relationships

English Analyzer (OAK system)

NE tagger Chunker Dependency Analyzer POS tagger Parser

Tokenizer Function Tagger Sentence Splitter Regularizer

Text

Sentence

Tokenized sentence

POS tagged sentence

Chunked sentence (low level constituent) NE tagged sentence (org,loc,person,time…)

Dependency between Chunks and constituents

Parse tree

Parse tree with Function labels

Regularized structure (predicate-argument)

Plain

Plain

Plain

PTB/tag PTB/tree Collins

PTB/tag PTB/tree CONLL

PTB/tag PTB/tree

CONLL, MUC PTB/tree

PTB/tree

PTB/tree

Triplet Glarf OAK

System

Research topics

Attributefor ENE


Web PeopleSearch

JapaneseSpelling

Multi/Singdoc. sum.

IE andSum.

Cross Ling.Pre-ODIE


Discovery

EncyclopediaQA


Cross Ling.QA


Query log& class

Informationneeds

On-demand IE

QA

IRSummarization

RelationDiscovery

ExtendedNE

Extra slides

IR, clustering and multi-doc. summaization（ Nobata et.al Gengo03) 　（ Nobata et.al DUC03) (Murakami et.al

Gengo04)

• Large number of retrieved documents• Automatic clustering by sub-topics

– 200 documents -> 5 clusters• Summarize each cluster (MDS)• Demo (http://apple.cs.nyu.edu/eliot)

http://apple.cs.nyu.edu/eliot

Cross Lingual Question Answering（ Sekine DARPA03) (Sekine TALIP04)

• Developed in a month in DARPA’s SurpriseLanguage experiment

• Type in question in English　　　 Translate question to Hindi

　 Find answer in Hindi newspaper　　　　　　 Translate answer to English• Demo (http://apple.cs.nyu.edu/clqa-eh)

http://apple.cs.nyu.edu/clqa-eh

Q Examiner

Translation (H->E)

Find answer

CLIR

NE tagger

Q in English

Keywords in EnglishType of expected answer

Keywords in Hindi

RelevantDocuments

Answer, context in HindiBi-ling. dict.

Numeric tagger

newspaper

NE tagged corpus

User I/F

Num. knowledge

Answer, context in English

Huge Text data & knowledge acquisition(Yoshihira et.al Gengo04) (Ando et.al LREC04) (Sekine Gengo04)

• Collecting huge corpus (untagged)– WEB pages : (Japanese 350G, English 300G)– Newspaper : 38 years, ~10G– Encyclopedia

• Tool– Suffix Array for the huge data

• Application– Semantic knowledge acquisition by lexico-syntactic pattern– Automatic acquisition of NE instances– IE pattern, paraphrase acquisition– Collect spelling variations– KWIC system (http://kwic.jp)

http://kwic.jp/http://kwic.jp/

Survey on Multi-document summarization（ Sekine et.al DUC03)

• Purpose– Multi-doc. summarization & IE

• Analyze summary/category of 100 doc. set• Category of doc. set

– 13 category: (multi|single)-(person|org|loc|…), other– Consistent tagging by people– It should be useful for summarization strategy

• Table format summary– 40-45%: Table is natural, 36-38%: Table is OK– Encouraging results for On-demand IE

Spelling variation in Japanese(Masuyama et.al Gengo04)

• Serious problem in real world systems– 30% of question for encyclopedia QA system– 40% of zero hits on dictionary look up

• A system to find the closest entry– 190,000 entries– Edit distance after narrowing down candidates

• If the target is in the dictionary, 100% in top 3• Demo （ http://languagecraft.jp)• Planning to construct variation dictionary from

WEB corpus

http://languagecraft.jp/http://languagecraft.jp/

Encyclopedia QA(Sekine Gengo03) (Sekine et al. Gengo05)

• User can ask question in natural language and get the answer from encyclopedia

• System is working as a real demo (http://www.japanknowledge.com)

• Answer in the entry words or numeral in explanation

• Answer prepared in advance by IE and pattern based Q matcher pinpoint the answer

http://www.japanknowledge.com/

2nd Grade Language Test

• Objectives– Realize the NLP technologies into the form which can be

easily observed by ordinary people – Observe the problems of the NLP technologies by

degrading the level of target materials – http://languagecraft.net/dennou

• Results

55.283.4196235355Total

29.445.5102234Reading

47.883.6117140245Word K.

90.893.2697476Kanji

Cor. In totalCor in knownCorrectKnown# of Q

http://languagecraft.net/dennouhttp://languagecraft.net/dennou

Research topics

Attributefor ENE


Web PeopleSearch

JapaneseSpelling

Multi/Singdoc. sum.

IE andSum.

Cross Ling.Pre-ODIE


Discovery

EncyclopediaQA


Cross Ling.QA


Query log& class

Informationneeds

On-demand IE

QA

IRSummarization

RelationDiscovery

ExtendedNE

My Research Interest

• “things” are the central– People, organization, location, event name,

vocation, product name …– How they are related– How they act, are used or are evaluated

• It is “knowledge”– How to create “knowledge”?

• Extract from huge corpus• Fusion with the existing knowledge (e.g.

encyclopedia)

Knowledge

Belongs to

Knowledge

Belongs to

Belongs to

Belongs toA part of

A part of

Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

A part of

A part of

Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

A part of

A part of

Located inLocated in

Work in, commute in

Located in

Located inLocated in

Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

Located nearby

A part of

A part of

Located in

Located in

Located nearby

Located nearby

Favorite teamLocated in

Work in, commute in

Located in

Located in

Located infeatures

Largest cityLocated in

Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

Located nearby

A part of

A part of

Located in

Located in

Located nearby

Located nearby


Work in, commute in

Located in

Located in

Located infeatures


Now visiting

Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

Located nearby

A part of

A part of

Located in

Located in

Located nearby

Located nearby


Work in, commute in

Located in

Located in

Located in Located infeatures


Now visiting

Friendship countries

Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


A member of

Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

Located nearby

A part of

A part of

Located in

Located in

Located nearby

Located nearby


Work in, commute in

Located in

Located in



Favorite teamAwarded

Now visiting


Saw it play

Pay money

Born in

Nationality

A member of

Live nearby

A member of

A member of

A member of

Nationality

Same name

A member of

Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


A member of

Knowledge

Belongs to

Belongs to

Belongs to

Located nearby

Has a lunch

Located nearby

Located nearby

Located nearby

A part of

A part of

Located in

Located in

Located nearby

Located nearby


Work in, commute in

Located in

Located in



Favorite teamAwarded

Now visiting


Saw it play

Pay money

Born in

Nationality

A member of

Live nearby

A member of

A member of

A member of

Nationality

Same name

A member of

Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


Located nearby

Located in

Located in

Located nearby

Located nearby

Located in


A member of

Cheer!

Boo!

Knowledge Creation• Not only by hand

– Cyc, EDR• Learn from Corpus

– Frequency, TFIDF• Pattern discovery, NE discovery

– Distributional similarity• NE discovery, attribute for NE, Paraphrase Discovery

– Lexico-syntactic pattern• Hyper-hyponym discovery, IE pattern

– Alignment• Paraphrase Discovery

Topics• Names

– Create categories (e.g. product name, event name)– Find entities (e.g. instances of laptop names)

• Relation– Find relations (e.g. pro/cons of products, sentiment, opinion)

• Categorize Relation– Paraphrase, textual entailment (e.g. same kind of opinion)

• Knowledge– Link and structure the extracted knowledge– Human interface