SpeciesBank Dreams and Realities Rainer Froese IfM-GEOMAR March 2005

Preview:

DESCRIPTION

Reality Check: Background Two-third of all major software projects fail (IHT )

Citation preview

SpeciesBank Dreams and Realities

Rainer FroeseIfM-GEOMAR

rfroese@ifm-geomar.deMarch 2005

The SpeciesBank Dream

„... a computer interface to the Internet able to find, combine and present data in a way that would be meaningful and

useful to the person who issued a query about a species.“

Reality Check: Background

Two-third of all major software projects fail (IHT 25.1.05)

SpeciesBanks, what are they not?

• Regional or global checklists• Purely distributed systems• Google: mix of good and garbage• Three year projects• Amateur products• Specialist products• Committee products• Community products

Reality Check: Users

• Taxonomists?• ‚Decision makers‘?• Stakeholders?• Nobody?• Depends on usefulness:• Mostly interested public & students• Few specialists

FishBase Web Users

Individuals

Private sector

Universities

Governments

NGOs

Museums

Int. Research Centres

Based on 2122 entries in the FishBase Guestbook, June 2004

FishBase Users by Continent

North America

Europe

Asia

South America

Africa

Australia

Oceania

Based on 2122 entries in the FishBase Guestbook , June 2004

Reality Check: User Needs

• Politically very important• Boring at best (motherhood statements)• Typically misleading

– Most users don‘t know what they need• Scientific approach: analyze actual usage

of what is available

FishBase Usage Compared with Internet Usage by Country, in 2001

0.0001

0.001

0.01

0.1

1

10

100

0.0001 0.001 0.01 0.1 1 10 100

Internet Users (% of all Users)

Fish

Bas

e S

essi

ons

(% o

f all

Ses

sion

s)

USA

Qatar

Fr Polynesia

Brunei

Japan

China

South Korea

Russia

Taiwan

India

Lebanon

Venezuela

0.0001

0.001

0.01

0.1

1

10

100

0.0001 0.001 0.01 0.1 1 10 100Internet usage by country (%)

Fish

Bas

e us

age

by c

ount

ry (%

)(J

uly

2004

)USA

China

Greece

Luxembourg

Saudi Arabia

Peru

Pakistan

Ghana

Belarus

Cuba

South Korea

Russia

BrazilAustralia

JapanGermany

PNG

India

Kenya

FishBase Web Usage by Topic May-July 2003

Common Names

Scientific Names

Direct Links

Species Summaries

Photos

Fish Quiz

Specialist Topics

Based on hits by directory on CGNET server

Common names

Scientific names

Direct links

Species summaries

Photos

Fish Quiz

Specialist topics

Based on AW Hits by directory on Kiel server

FishBase Usage by Topic July 2004

Frequency of FishBase Usage by Topic

0 10 20 30 40 50 60

Genetics

Diseases

FishQuiz

Reproduction

Maps

FB book

Physiology

Population dynamics

Trophic ecology

Identification

LarvalBase

Scientific Names

Museum collections

Eschmeyer

Glossary

References

Country information

Photos

Common Names

Species Summaries

Percent of total page views (July 2004)

About 10,000 visitors per month

What Determines Usage

• Quality and accuracy?• Recognition of scientists behind database?• MoUs? • Beautiful interface, fancy tools?• Content: common names, photos, summaries• Simplicity of interface (e.g. Google)• Number of clicks needed; response time

FishBase Usage over Time

0

2

4

6

8

10

12

14

16

A 98 A 99 A 00 A 01 A 02 A 03 A 04 A 05Years (August)

Hits

(mill

ions

)

0

100

200

300

400

500

600

700

800

900

Use

r se

ssio

ns (t

hous

ands

)

What Determines Usefulness?

• Actual use• What is not used is useless• How about yourself (the custodian)?

1

10

100

1 10 100 1000 10000 100000

Myx

Ceph

Holo

Elasmo

Sarco

Actino

Species per Class (n)

Str

ateg

ies

per

Cla

ss (

n)

Number of strategies used by phylogenetic Classes plotted over number of recent species in the Class, with linear regression line forced through the origin; slope = 0.37; r2 = 0.9754 .

1

10

100

1 10 100 1000 10000 100000

Actino

Elasmo

Sarco

Species per Class (n)

Ord

ers

per

Cla

ss (

n)

Orders per Class plotted over Species per Class. Sarcopterygii, Elasmobranchi and Actinopterygii fall nearly on a hypothetical straight line through the origin; slope = 0.37.

1

10

100

1 10 100 1000 10000 100000 1000000

Species per Class (n)

Ord

ers

per C

lass

(n)

Animalia Plantae Fungi Protozoa Max Orders

Actinopterygii

Insecta

Orders per Class plotted over species per Class for four Kingdoms and 415,000 species; the dotted line indicates the maximum number of Orders per species in a Class; slope = 0.37.

What is the Best Quality Assurance ?

• Scientific degree of encoders?• Double-encoding?• Hierarchy of checking?• Usage by custodians!• Usage by others!

Speed of Data Flow

What determines speed of data flow?• Bandwidth?• Trust!

How to Prioritize Data Entry

What approach is best when prioritizing data entry?

• User need analysis?• Importance and quality of data?• Opportunism! Enter what is ready for

entry.

Enemies

Who are your most dangerous enemies?• Critiques?• Jealous colleagues?• Unconvinced donors?• Institutions!

Data Encoders

Who are the best data encoders?• Students?• Long-time staff?• Experts?• Women!

Members of the FishBase Team in 1998

Back to Dreaming

Building the AllFish Species Portal1. Form Consortium of respective SpeciesBank

Custodians and Institutions2. Agree on Concept, Standards and Protocols3. Use FishBase Interface and Servers4. Have small AllFish Encoder and Programmer

team 5. Find modest funding from different donors6. Have AllFish up-and-running within one year

Don’t Dream It

Be It

Recommended