54
Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents Visualizing PLoS Data DHUG 2014 February 11, 2014 Kevin Boyack, Bob Kasenchak, Margie Hlava

PLOS Visualization Project

Embed Size (px)

DESCRIPTION

Presented at the 10th annual Data Harmony Users Group meeting on Tuesday, February 11, 2014 by Kevin Boyack of SciTech Strategies. Shows how a comprehensive map of the scientific literature was used to visualize the PLOS thesaurus. The resulting visualization becomes a new visual template that can be used to 1) examine the thesaurus structure, content, and level of detail; and 2) show coverage and trends for various entities such as journals, institutions, and even individual authors.

Citation preview

Page 1: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

Visualizing PLoS Data

DHUG 2014February 11, 2014

Kevin Boyack, Bob Kasenchak, Margie Hlava

Page 2: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

2

AGENDABackgroundDataScience mappingMapping thesaurus directlyMapping thesaurus indirectly» SciTech S&T map» Thesaurus map

Overlays

Page 3: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

3

BACKGROUNDTo answer questions posed by PLoS using PLoS index (thesaurus) terms, a map of the thesaurus space is desirable» Where is the coverage thin, where is it dense?» What areas are core and which are emerging trends? Which fields are growing

quickly?» Which fields are interrelated?» Which fields have low activity but are related to very active fields?

Some questions require more than PLoS data» How does the coverage in PLoS journals relate to coverage in other databases?

A map of the thesaurus space can function as a template upon which various distributions can be overlaidSome questions are more amenable to tables than maps

Page 4: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

4

DATAThesaurus» 10,551 unique terms in 15,164 locations» 11 top level terms» 296 second level terms in 305 locations

PLoS article data parsed / indexed by Access Innovations» Article level data through mid-2013; 84,989 records» 72,796 records with index (thesaurus) terms; 171,181 terms

Scopus data» SciTech Strategies licenses use of Scopus data from Elsevier» 1996-2011; 25M records

Page 5: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

5

THESAURUS STRUCTUREBiology CS Earth Env Eng Medicine People Physical Methods SS

Page 6: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

6

SCIENCE MAPPINGA map is a visual representation of the relationships within a» Classification system (e.g., journal categories, taxonomy)» Document corpus

Science maps can/have been created using» Documents, Journals, Authors, Terms, Taxonomies

30-40 year tradition of science mapping» Well-established methodologies» Current computing power and data availability enable large scale mapping and

analysis

Maps used for communication, strategy, planning, evaluation …» Understanding of structure, relative location» Basis for metrics

Page 7: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

7

GENERAL PROCESS

Select data source and objects (e.g., docs) of interest Scopus / WoSPubMedPatentsYour own data

Calculate similarity between objects (doc-doc) Citations (links)Words (title/abstr)

Create a visual layout (map) of the objects / clusters Pajek / Gephi

Page 8: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

8

MAPPING PLOS TERMS DIRECTLYMapped second level terms» All terms rolled up to the second level term at the top of the branch

Cosine similarity between terms based on co-occurrence in articlesMap created using Fruchterman-Reingold mapping routine in Pajek» Based on only the top 3 links per term – strongest linkages

Each node (term) colored using top level terms» Biology and life sciences (light green)» Medicine and health sciences (salmon)» Physical sciences (purple)» Research and analysis methods (yellow)» Social sciences (light orange)» …

Page 9: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

9

MAPPING PLOS TERMS

Page 10: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

10

MAPPING PLOS TERMS DIRECTLYResulting map is difficult to use» Using second level terms leaves out needed detail» Term counts span several orders of magnitude – hard to map well» Terms within the same branch of the thesaurus don’t cluster well

Resulting map doesn’t account well for context (areas where PLoS has low coverage)Need a map that does a better job of providing context, and that better matches our current picture of scienceSo … we decided to index Scopus data using the PLoS rule-base and map based on that more inclusive context

Page 11: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

11

MAPPING PLOS TERMS INDIRECTLYUse existing map of all of scienceTriangulate term positions

Page 12: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

12

SCITECH S&T MAP20 million articles (1996-2011)3 million patents (1996-2011)

220,000 document clustersCluster contents using citationCluster positions using text

The visual is the final step(last 10%); the full process isneeded for robust analysis

Template on which other infocan be overlaid; documents (andrelated info) tied to map positions

We are the only ones in theworld who do contextual analysisat this scale

Page 13: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

13

TRIANGULATION

Use full S&T map as basis

Based on Scopus 2007-2011

Locate each thesaurus term as theaverage position of its documents

The resulting map contains 11,934terms; full coverage; interpretable

Terms with multiple colors appearmultiple times

The term map becomes the basemap

Page 14: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

14

THESAURUS BASEMAP

The term map becomes the basemap

Circle size - # documentsCircle color – top level term

Many small circles – detailed thesaurusand rule base that differentiates

Few large circles – broad terms andrule base that doesn’t differentiate

Location/size of terms facilitates questions- Is term well located in the thesaurus?- Is the rule base acting as desired?- Should term be broken up?

Page 15: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

15

NOT JUST PRETTY PICTURES

All visualizations are based on tabular dataPictures help form hypothesesDetailed analysis provides answers

Page 16: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

16

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Circuit modelsChromatin immunoprecipitation

Evolutionary modelingElectron transport chain

IntelligenceLanguage

CognitionDecision making

Page 17: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

17

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Page 18: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

18

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Page 19: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

19

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Page 20: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

20

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Page 21: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

21

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Page 22: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

22

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and placesSocial sciencesScience policy

Page 23: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

23

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciences

People and placesSocial sciencesScience policy

Page 24: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

24

ANALYSES

Biology and life sciencesComputer and information sciencesEarth sciencesEcology and environmental sciencesEngineering and technologyMedical and health sciencesPhysical sciencesResearch and analysis methodsPeople and places

Page 25: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

25

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 26: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

26

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 27: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

27

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 28: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

28

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 29: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

29

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 30: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

30

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 31: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

31

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 32: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

32

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 33: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

33

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 34: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

34

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 35: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

35

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 36: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

36

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 37: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

37

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 38: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

38

ANALYSES

PLoS – All journals and yearsPLoS – All journals (2008)PLoS – All journals (2009)PLoS – All journals (2010)PLoS – All journals (2011)PLoS – All journals (2012)PLoS – All journals (2013)

PLoS BiologyPLoS Computational BiologyPLoS GeneticsPLoS MedicinePLoS NTDPLoS PathogensPLoS One

Page 39: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

39

ANALYSES

PLoS One – All yearsPLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 40: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

40

ANALYSES

PLoS One – All yearsPLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 41: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

41

ANALYSES

PLoS One – All yearsPLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 42: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

42

ANALYSES

PLoS One – All yearsPLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 43: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

43

ANALYSES

PLoS One – All yearsPLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 44: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

44

ANALYSES

PLoS One – All yearsPLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 45: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

45

ANALYSES

PLoS One (2007-2011)PLoS One (2008)PLoS One (2009)PLoS One (2010)PLoS One (2011)PLoS One (2012)PLoS One (2013)

Page 46: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

46

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 47: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

47

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 48: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

48

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 49: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

49

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 50: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

50

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 51: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

51

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 52: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

52

ANALYSES

2007-2011

PLoS OnePNASJACSJ Mat ChemPhys Rev LettJ Nanosci NanotechnLNCS

Page 53: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

53

SUMMARYMapping the PLoS thesaurus directly (using co-occurrence) gave maps that were only partially useful.Indirect mapping of the PLoS thesaurus using a comprehensive map of the scientific literature (based on Scopus data) was very successful. The resulting visualization of the PLoS thesaurus became a stand-alone basemap, a template that can be used to:» Examine the thesaurus structure, content, and level of detail.» Show coverage and trends for various entities such as journals, institutions, authors,

etc.

This thesaurus mapping effort helped visualize which areas of the PLOS thesaurus were covered well, or not well enough, as well as which fields were emerging or well-established.All overlays based on tabular data, so detailed analysis is possible.

Page 54: PLOS Visualization Project

Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Patents

54

QUESTIONS

Thank-you for your attention !