25
Amparo Elizabeth Cano Basave 1 , Francesco Osborne 2 , Angelo Salatino 2 1 Aston University, United Kingdom 2 KMi, The Open University, United Kingdom EKAW 2016 Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction based on Innovation-Adoption Priors

EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Embed Size (px)

Citation preview

Page 1: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Amparo Elizabeth Cano Basave1, Francesco Osborne2, Angelo Salatino2

1 Aston University, United Kingdom2 KMi, The Open University, United Kingdom

EKAW 2016

OntologyForecastinginScientificLiterature:SemanticConceptsPredictionbasedon

Innovation-AdoptionPriors

Page 2: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

22

Osborne, F., Motta, E. and Mulholland, P.Exploring scholarly data with Rexplore.International Semantic Web Conference 2013

technologies.kmi.open.ac.uk/rexplore/

Page 3: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

TheComputerScienceOntology1

• Notfine-grainedenough.– E.g.,only2topicsareclassifiedunderSemanticWeb

• Static,manuallydefined,hencepronetogetobsoleteveryquickly.

3

Standardresearchareastaxonomies/classifications/ontologiessuchasACMarenotapttothetask.

ACM 2012

Page 4: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

TheComputerScienceOntology(CSO)wasautomaticallycreatedandupdatedbyapplyingtheKlink-2algorithm.

Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In ISWC 2015. (2015)

TheComputerScienceOntology2

Page 5: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

• WeautomaticallygeneratedaversionofCSOconsistingofabout15,000topics linkedbyabout70,000semanticrelationships.

• ItincludedverygranularandlowlevelresearchareasanditcanberegularlyupdatedbyrunningKlink-2onanewsetofpublications.

• WealsohavedifferentversionsofCSOobtainedbyrunningKlink-2onthesetofdocumentsuptoacertainyear.

5

TheComputerScienceOntology3

5

CSO 2012 CSO 2013 CSO 2014 CSO 2015

[…]

Page 6: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Asharedconceptualization

“Ontologiesareaformal,explicitspecificationofasharedconceptualization”(Studer etal.,1998)

“Theconceptualizationshouldexpressasharedviewbetweenseveralparties,aconsensusratherthananindividualview“(Guarino atal,2009)

“Ontologiesareus:inseparablefromthecontextofthecommunityinwhichtheyarecreatedandused.”(Mika,2005)

“OntologyEvolutionisthetimelyadaptationofanontologytothearisenchangesandtheconsistentpropagationofthesechangestodependentartefacts.”(Stojanovic,2004)

6

Page 7: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Butwhatifwecannotwaitforsharedconsensus?

Theseontologiesreflectthepast,andcanonlycontainconceptsthatarealreadypopularenoughtobeselectedbyexpertsorautomaticmethods.

Hence,theyhardlysupporttaskswhichinvolvetheabilitytodescribeemergingconcepts,e.g.:

• Exploringtheforefrontofresearch;

• Trenddetection;

• Horizonscanning;

• Producingsmartanalyticstoinformbusinessdecision.

77

Page 8: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

OntologyForecastingGivenanontologyintimet,ateamofexpertsand/orasoftwareconsideranumberofrelevantknowledgesourcesandupdatetheontologybyalsoincludingnewconceptsonwhichtherewillbe (probably)asharedconsensusintimet+1.Forexample,aforecastedontologyofresearchtopicsin2000mayalreadyincludeanewtopicassociatedtothedynamicspreludingtothe“SemanticWeb”(newcollaborationsbetweenKnowleged BaseSystems,AIandWWW)

8

[…]

t-n t-1 t t+1

Page 9: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Contributions– afirststeptowardsontologyforecasting

1. Weapproachthenoveltaskofontologyforecastingbypredictingsemanticconceptsintheresearchdomain.

2. Weintroducemetricstoanalysethelinguisticandsemanticprogressivenessinscholarlydata.

3. Wepropose SemanticInnovationForecast(SIF) anovelweakly-supervisedapproachfortheforecastingofemergingsemanticconcepts.

4. Weevaluateourapproachinadatasetofover1milliondocumentsintheComputerSciencedomain.

– Theproposedframeworkofferscompetitiveboostsinmeanaverageprecisionattenforforecastsover5years.

9

Page 10: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Scopus(ComputerScience)- #ofpublications

10

0

50000

100000

150000

200000

250000

1 9 9 5 1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7

NUMBEROFA

RTICLES

YEAR

Page 11: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Scopus(ComputerScience) – vocabularysize

11

0

20000

40000

60000

80000

100000

120000

140000

160000

1 9 9 5 1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7

VOCA

BULARYSIZE

YEAR

Page 12: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Klink-2ComputerScienceOntology- #ofclasses

12

Page 13: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

LinguisticProgressiveness

Languageinnovationinacorpusreferstotheintroductionofnovelpatternsoflanguage.

WegeneratealanguagemodelperyearusingKatzback-offsmoothinglanguagemodelandanalyzeddifferencesbetweenconsecutiveyearsbyusingtheperplexitymetric.

13

0

2E+10

4E+10

6E+10

8E+10

1E+11

1.2E+11

1.4E+11

1 9 9 5 1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7

PERP

LEXITY

YEAR

Page 14: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

LinguisticProgressiveness

Wealsoperformaprogressiveanalysisbasedonlexicalinnovationandlexicaladoption.

Alargenumberofnewwordsappeareachyear,butonlyfewofthemareadopted(i.e.,stillusedinthefollowingyear).

14

0

10000

20000

30000

40000

50000

60000

70000

1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7

NUMBEROFW

ORD

S

YEAR

# of new words per year

# of adopted words per year

Page 15: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

MeasureLinguisticProgressiveness

15

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7

LING

UISTICPRO

GRESSIVENE

SS

YEAR

We introduce the linguistic progressiveness metric:

𝑳𝑷𝒕 =𝑳𝑨𝒕𝑳𝑰𝒕

Page 16: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Innovation-AdoptionPriors

Weassumethatemergingtopicswillbeassociatedwithnovelwords,thuswecomputepriorsintimetbyconsideringinnovative(LI)andadoptedwords(LA).

Awordpriorisaprobabilitydistributionthatexpressesawordrelevanceto- inthiscase- beingcharacteristicofinnovativetopics.

Webuildthepriormatrixbyassigningaweighttoeachterminthisvocabulary.

– 0.7ifw∈ LIt−2 and0.9ifw∈ LAt−1.Becauseouranalysisshowsthatrecentlyadoptedwords(LA)aremoreoftenassociatedwithemergingtopicsthannewwords(LI).

16

Page 17: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

SemanticInnovationForecast(SIF)model

SIFisagenerativeprobabilistictopicmodel thattakesininputasetofdocumentsatyeartandasetofhistoricalpriorsandforecasttopicworddistributionsrepresentingnewconceptsintheontologyOt+1.

17

Page 18: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

SemanticInnovationForecast(SIF)model

18

WeuseCollapsedGibbsSamplingtoinferthemodelparametersandtopicassignmentsforacorpusatyeart+1givenobserveddocumentsatyeart.

Page 19: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Evaluation

WeperformthistaskbyapplyingourframeworkontheScopusdatasetforComputerScience(>1Mpublications).

Eachcollectionofdocumentsinayearisrandomlypartitionedintothreesubsets:20%isusedtoderiveinnovationpriors,40%trainingset,40%testingset.

WetrainaSIFmodelonyeartusinginnovativepriorscomputedforthetwopreviousyears(t-1andt-2)andweusetheSIFmodeltoforecastsemanticconceptsatyeart+1.

Wethenmeasurecomputethecosinesimilaritybetweenthepredictedsemanticconceptsfort+1andthegoldstandardconceptsforthatyear.WeconsideraconceptcorrectlyforecastedifthesimilaritywithaGSconceptishigherthan0.5.

19

Page 20: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Evaluation- Baselines

WecompareSIFagainstfourbaselines.Forayeartforecastingforyeart+1:

1. LDATopics(LDA) onthefulltrainingset.Thissettingmakesnoassumptionoverinnovative/adoptedlexicons.

2. LDAInnovativeTopics(LDA-I);computestopicsbasedondocumentscontainingatleastonewordappearinginLIt.

3. LDAAdoptedTopics(LDA-A);computestopicsbasedonlyondocumentscontainingatleastonewordappearinginLAt.

4. LDAInnovation/AdoptionTopics(LDA-IA); computestopicsbasedonlyondocumentscontainingatleastonewordappearinginLIt orLAt.

20

Page 21: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Evaluation- MeanAveragePrecision@10

21

Year SIF LDA LDA-A LDA-I LDA-IA

2000 0.70 0.12 0.48 0 0.412002 0.87 0 0.82 0.64 0.752004 0.91 0 0.58 0.57 0.632006 0.87 0.31 0.78 0.84 0.692008 0.99 0.40 0.68 0.57 0.70AVG 0.87 0.17 0.67 0.52 0.64

Page 22: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Conclusion

Itispossibletoforecastreliablyemergingsemanticconceptsiftheontologyisassociatedwithalargecollectionofdocument.

Thenextchallengeistoforecastnewversionofanontology,thatistoproduceanontologythatincludesallconceptsandrelationshipsthatwillbe(probably)includedinthenewversion.

22

Page 23: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Futureworks

• Integrationofexplicitandlatentsemantics;

• Includinggraph-structureinformationintothemodel;

• Understandinghowresearchtopicsarecreatedandforecasttopictrends.

23

Salatino, A.A., Osborne, F., Motta, E. (2016) How are topics born? Understanding the research dynamics preceding the emergence of new areas. PeerJ Preprints

Page 24: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

Francesco Osborne Angelo SalatinoAmparo Cano Basave

Elizabeth Cano-Basave, A. E., Osborne, F., Salatino, A.A. (2016) Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction based on Innovation-Adoption Priors. EKAW 2016, Bologna, Italy

Email: [email protected]: FraOsborneSite: people.kmi.open.ac.uk/francesco

Page 25: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors