Upload
cyndy-parr
View
2.880
Download
2
Embed Size (px)
DESCRIPTION
These are Cyndy Parr's presentations at the EOL Global Partner Summit, starting with an overview of the meeting, and including an overview of how we set up content partnerships, and how we calculate and use page richness scores.
Citation preview
Cynthia ParrSpecies Pages Group
Global Content Summit17-19 Jan 2011
http://www.eol.org• All species known to science• Freely accessible: open
access, open source• Available from a single portal
in a common format• Quality• Constantly growing• Aimed at multiple audiences
EOL Global Partners
China
Australia
Dutch
South Africa
Costa Rica
MexicoPan-Arab
India
Colombia
Peru
GBIF
ViBRANT
BHL-Global BHL
Aims of global partnersGlobal access to knowledge about life on EarthTo increase awareness and understanding of living
nature through an Encyclopedia of Life that gathers, generates and shares knowledge in an open, freely accessible and trusted digital resource
Work together towards this vision and mission, sharing expertise and knowledge as appropriate
Expand the global pool of knowledge about biodiversity and improve access to it
Aims of this workshop• Gather content experts from Global Partners• Become familiar with each other’s work• Learn how core EOL works and provide
feedback on it• Form the Species Pages Working Group
Team at Smithsonian (SPG)Representatives from global partners
• Draft individual plans that complement each other towards a common goal
• Remind ourselves WHY we want to do this
What is content?Biological information
Names and hierarchiesDescriptive textLiteratureMultimediaMapsLinks to more information
…..what about comments, collection annotations?
Overview of agenda
Day 1: IntroductionsDay 2: SharingDay 3: Planning
Acknowledgements• Funding from:
David M. Rubenstein giftJohn D. and Catherine T. MacArthur FoundationAlfred P. Sloane FoundationSmithsonian InstitutionMarine Biological LaboratoryHarvard University and other funders and donors
• All our content partners and global partners• Volunteer curators and individual contributors via Flickr, Wikimedia,
and members of EOL• All of you for coming• Claire Badgley
Cynthia ParrSpecies Pages Group
Global Content Summit17-19 Jan 2011
Overview of Content Partnering
DatabasesJournalsLifeDesks & ScratchpadsPublic contributions
EOL is a content curation community
Curate
CommentRate, Collect
eol.orgAggregate
API
Third party apps
Quality control, prioritization
http://eol.org/content_partners
http://eol.org/info/content_partner_collections
Low hanging fruit
Photo credit: Stanislas PERRIN
Partner trajectory
Y1Q3 Y1Q4 Y2Q1 Y2Q2 Y2Q3 Y2Q4 Y3Q1 Y3Q2 Y3Q3 Y3Q4 Y4Q1 Y4Q2 Y4Q30
25
50
75
100
125
150
Num
ber o
f par
tner
s
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 1360
100000
200000
300000
400000
500000
600000
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 1311
10
100
1000
10000
100000
1000000
Partners in order of # taxa contributed to EOL
Num
ber o
f tax
a fo
r whi
ch c
onte
nt is
con
tribu
ted
to E
OL Long Tail in databases contributing to EOL
… viewed on log scale
Content strategyHighlightsPrioritiesRichness scoreProcessesGoals
http://eol.org/info/partners
Content Partner process overviewPartner creates an EOL member accountAdds a content partnerWe communicate with themThey (or we) upload a resource file or set a
URL where one can be foundThey set a harvest frequencyEOL harvests at that frequency
Current methods of data transferEOL resource document (XML) (usually they do
the work)Spreadsheet upload (either can do the work)Connector (we do the work)
Scrape web site or PDFUse web servicesWork from a copy of DB
Darwin Core Archive (classifications, soon)
See http://eol.org/info/cp_resource_checklist
How EOL gets content n=141 partners
XML resource doc Connector LD/eLD/Scratchpad
Spreadsheet0
10
20
30
40
50
60
70
CSV
web service
HTMLDB
LD/eLD/Scratchpad
Example partner• Pensoft has a
process to generate EOL-compliant XML for new species
• Also sends images to Morphbank, specimens to GBIF
• They registered the URL at EOL
• Our script checks for changes once a day
EOL Schema Sources
Content typeTaxaAttribution & licensingText objects & linksMultimedia
Standards usedDarwin Core ArchiveDublin & Darwin CoreSpecies Profile Model(and
now +)Dublin (+ Audubon Core)
EOL Table of Contents TDWG Species Profile Model
Physical Description › Morphology #MorphologyPhysical Description › Size #SizeEcology › Habitat #HabitatEcology › Associations #AssociationsLife History & Behavior › Life Expectancy #LifeExpectancy Evolution and Systematics › Functional Adaptations
#Evolution
Conservation > Conservation Status #ConservationStatus Molecular Biology and Genetics › Genetics #GeneticsMolecular Biology and Genetics › Genome #MolecularBiologyMolecular Biology and Genetics › Molecular Biology
#MolecularBiology
Nucleotide Sequences #MolecularBiology
Example biological content
EOL v2
Plinian Core
DwCdescription
SPMinfoitem
usingDarwin Core Archive flat files as transport mechanism
EOL v3?
Relations
Numeric values
Controlled vocabulary
PartnersCan delete or replace any of their objectsControl how often we harvest, and can force a harvestGet an automatically updating collectionCan request that we use their classification for browsingCan change the logo and description of their projectReceive comments and curator actions immediatelyReceive monthly reminders they can get traffic statisticsGet many links back to their original web resources
Partners cannot
Publish the very first timeDecide if they are pre-vettedRoll back a harvestChange the object of any other partnersChange classifications from any other
partners
Cynthia ParrSpecies Pages Group
Global Content Summit17-19 Jan 2011
Richness scores
http://eol.org/pages/704102
Taxon page richness algorithm
a (Breadth) b (Depth) c (Diversity)+ +
Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status
Depth: # words per text object, # words total
Diversity: Sources (partners)
60% 30% 10%
0 – 100, Threshold 40
Summary of EOL page richnessOverall950,000 have content2 % are rich~22 % have only links to literature
Hot List30 % of 75K are richAverage richness = ~30
Red Hot List56 % of 3K are richAverage richness = 43
How richness is usedChoose images for home page “March of Life”Allows sorting in collections Weird life example
Helps provide best search and API results
Any other ideas? Could we be matchmakers for pages needing enrichment and users?
http://synthesis.eol.org/media/treemap
Strategies for improving richnessCrowd-sourcingCollectionsCommunitiesMobile apps
LeveragingEnabling platformsEnabling journalsData mining BHL etc.
The page richness index
Helps fill gaps with existing knowledgeHelps prioritize funding and training so that it
has maximum impact on closing true gapsWill be available via API
Computing and storing richness index on EOL is a step towards storing and serving computable data