Da molin databases_ecn_2012

Preview:

DESCRIPTION

 

Citation preview

Biodiversity Biodiversity Data vs. the Data vs. the

Web 2.0Web 2.0OR

How I learned to stop worrying and love the “systems”

Ana Dal MolinJ. B. Woolley

Texas A&M University

Source: Opte.orgJan 2005

[ Why this talk ]

• Data providers• Aggregators• Tools • etc

“growth in bioinformatics data exceeded Moore’s Law, the well-known observation that the number of transistors on a chip doubles every 18 months.” (Butte, 2001, TRENDS in Biotechnology 19(5))

• Johnson, N. 2007. Annual Rev. Entomology• http://www.ala.org.au/about-the-atlas/downloadable-tools/tools-review/• IDigBio

47*

[ what do I use? ]

• Museums often have already decided on a model/database system

• Each researcher, on the other hand, may not have, so questions– Content management systems (CMS)?– Which output?– Stability? – Best practices?

‘systems’ available• First Generation: desktop-based (MS Access,

FileMaker)• Second Generation: desktop-based with web output• Third Generation: content management systems

(PHP, Ruby, MySql, etc.)

Data Accessibility

Your data on the ‘net

• Reach• Model

GBIF species distribution data coverage (2010)

[ ? ]

Metadata

Data

Metadata repository Name IndexOccurrence Index

Yellow PagesRegional Atlas

Annotation Tools

Biosecurity Portal

Analysis Tools Products

LaSalle, 2008. Atlas of Living Australia, ICE2008 presentation

[ where do I stand? ]

• Taxonomy as 2-natured science• Shifts in media format

Web 1.0 -> Web 3.0 1.0: Static HTML, e-mail, forums, chat 2.0: Dynamic HTML, Wikis, blogging,

commenting, social networking 3.0: …

*You and your work are not invisible before publication*

• Web 3.0:– “Social”– Tags – Cloud computing– Ubiquitous connectivity – Open technologies, open data formats (and open identity

too)– Publishing in languages specifically designed for data

(databases, markup)– Semantic web– Marketing

http://www.tdwg.org

• What the user wants • What you have to deal with

*

*not done!

Think it through

Books Gutenberg Gutenberg Project WordCat Hashi Trust

The way we collect information is differentThe way we accumulate information is differentThe way we understand information is different

… or not

Jan/201233%USA, 20% Brazil, 26% Europe (Germany, Sweden, Spain, Greece, UK)

1.0 2.0

• Web 3.01. People lie2. People are lazy3. People are stupid4. Mission: impossible – know

thyself5. Schemas aren’t neutral6. Metrics influence results7. There’s more than one way to

describe something

C. Doctorow, Metacrap, 2001

Issues • “Unification”* is not going to happen – curators and

researchers will always have their own – (although often largely overlapping) set of crucial

information fields which can be cross-linked• These days, it is imperative that databases

communicate with each other• ‘unitary taxonomy’ is also not possible and any big

database needs to allow the system to display conflicting ideas

* Thomas, C. “Biodiversity databases spread, prompting unification call”, Science v. 325 (2009)

** http://hymao.org

Data ephemerality

• Local vs. Web data

?!

Source: Wikipedia, “Science 2.0”

Data ephemerality• Digital data preservation: Internet Archive, IIPC• Library of Congress discussions and recommendations

– Disclosure, Adoption, Transparency , External dependency, Technical protection

• http://www.digitalpreservation.gov/formats

User perspective “Incomplete” sites Dynamic information

Selective information?

Why I am not a luddite:

Online databases are taxonomic product and marketing for your work

Online biodiversity databases complement your work

But it’s up to you to be able to make the user understand that your work is more than that

The user of online databases is probably not the same as the person who will get your paper

summing up• Choose the system based on reports you want/need to

deliver

… or work with a journal/team that can help you• Make sure the system is flexible enough in your hands• Decide who will do the maintenance of your data

– How big is your team?– Fluidity (positive and negative)

• Think about stability and backup strategies

Thanks!!

Recommended