71
Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories Königin-Luise-Straße 6-8 14195 Berlin BioCASe Workshop, Melbourne, Feb 4-5th 2010

Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

Embed Size (px)

Citation preview

Page 1: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

BeispielbildThe BioCASe

Technology

Jörg HoletschekBotanic Garden & Botanical Museum Berlin-DahlemDept. of Biodiversity Informatics and LaboratoriesKönigin-Luise-Straße 6-814195 Berlin

BioCASe Workshop, Melbourne, Feb 4-5th 2010

Page 2: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

2BioCASe Workshop, Melbourne, Feb 4-5th 2010

Agenda

1. The BioCASe Architecture: An Overview

2. The BioCASe Provider SoftwareFeature, Requirements, Installation, Configuration

3. The ABCD and HISPID data standardsIntentions, Structure, Elements, Use

4. Preparing the database for BioCASe/ABCD

5. Setting Up DatasourcesDatabase connection, Table Setup, Mapping Process;Testing, Data Backups

6. Other Issues

Workshop Wiki: http://hiscom.chah.org.au/wiki/BioCASe_Workshop

Page 3: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

Beispielbild

1. BioCASe Technology:

Motivation, Idea and Architecture

Page 4: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

4BioCASe Workshop, Melbourne, Feb 4-5th 2010

Herbaria, Drawings

© J. Holstein et al.

Page 5: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

5BioCASe Workshop, Melbourne, Feb 4-5th 2010

Preserved Specimens

© J. Holstein et al.

Page 6: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

6BioCASe Workshop, Melbourne, Feb 4-5th 2010

Living Collections

© J. Holstein et al.

Page 7: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

7BioCASe Workshop, Melbourne, Feb 4-5th 2010

Culture collections

© J. Holstein et al.

Page 8: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

8BioCASe Workshop, Melbourne, Feb 4-5th 2010

Primary Biodiversity Data

Biological Collection Access Service

Documentation of the occurrence of one species

at a given location at a certain point in time

= Primary Biodiversity Data Record

Page 9: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

9BioCASe Workshop, Melbourne, Feb 4-5th 2010

Data sources worldwide

- Index Herbariorum: 3,293 herbaria, 400 million herbarium sheets- 50-100,000 natural history collections, 1.5-2 billion specimens- With observations added, occurrence records 3+ billion (10b?)

Over 75% of biodiversity information are stored in developed countries.

Est. 75% of all species are found in the developing world.

Source: BARTHLOTT et al. 1999

Page 10: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

10BioCASe Workshop, Melbourne, Feb 4-5th 2010

Accessibility

Stage 0: Only in real world (paper catalogues, just stacks)Only meta information available on the web

Stage 1: Stage 2: Online catalogue Digitalization of specimen

Page 11: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

11BioCASe Workshop, Melbourne, Feb 4-5th 2010

Biodiversity Data

Stage 3: Networking the databases

Page 12: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

12BioCASe Workshop, Melbourne, Feb 4-5th 2010

Architectural Overview

2. Wrapper Software

1. Protocols/Data Standards

Data Quality CheckerDataMining

3. Applications

Data Portal

Page 13: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

13BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Design Principles

No central database Data remain in the existing DB systems Data Provider gets full credit Full control over published data by collection holder

Partial publication possible

Collection holder can withhold information from publication (e.g., locality data for endangered species) or exclude records (e.g. until research results are published)

Wrapper principle Data remain in original collection management system No changes in workflow for curator/local users

Page 14: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

14BioCASe Workshop, Melbourne, Feb 4-5th 2010

2: The BioCASe ProviderSoftware

Wrapper: BioCASeProvider Software

Protocols/Data Standards

Data Quality CheckerDataMining

Applications

Data Portal

Page 15: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

15BioCASe Workshop, Melbourne, Feb 4-5th 2010

Software package that „wraps“ around the collection database Equips it with a BioCASe protocol compliant interface

1. Accepts requests from the network

3. Transforms results into ABCD documents and sends them back

BioCASe Provider Software (Wrapper)

Marmotamarmota?

2. Tanslates queries to the collection database

SELECT *FROM specimenWHERE ScientificName LIKE “Marmota marmota%“

Page 16: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

16BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Provider Software (Wrapper)

Compatible with several protocols (BioCASe, DiGIR) and data schemas (ABCD, ABCD-EFG, ABCD-DNA, DarwinCore)

Works with all SQL-compliant databases (Access, mySQL, Postgres, SQL Server, Oracle, ...)

Currently 84 production installations serving 1.315 collections

Platform independent Support

available!

Page 17: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

17BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Providers Worldwide

84 production installationsserving 1.315 collections

Page 18: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

18BioCASe Workshop, Melbourne, Feb 4-5th 2010

Requirements

- SQL compliant database (with existing Python connectivity module)

- Webserver (preferrably Apache), allowing the execution of Python scripts

- Possibility to install additional Python packages

Page 19: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

19BioCASe Workshop, Melbourne, Feb 4-5th 2010

Steps

1. Installing Apache

2. Installing Python

3. Downloading BPS (from repository/archive)

4. Installing BPS

5. Creating the link Apache/BPS

6. Testing BPS, Setup of additional packages

7. Changing directory permissions

Page 20: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

20BioCASe Workshop, Melbourne, Feb 4-5th 2010

1. Installing Apache

http://httpd.apache.org/download

Page 21: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

21BioCASe Workshop, Melbourne, Feb 4-5th 2010

2. Installing Python

http://www.python.org/download/

Page 22: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

22BioCASe Workshop, Melbourne, Feb 4-5th 2010

3. Downloading BPS

Archive: http://www.biocase.org/products/provider_software/

Subversion repository

Trunk version: http://ww2.biocase.org/svn/bps2/trunk

Defined version: http://ww2.biocase.org/svn/bps2/tags/release_2.5.2

Linux:

svn co <url> <path>

Windows: Tortoise client

Page 23: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

23BioCASe Workshop, Melbourne, Feb 4-5th 2010

4. Installing the BPS

Setup.py

No files copies,

only adapted!

Page 24: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

24BioCASe Workshop, Melbourne, Feb 4-5th 2010

5. Linking BPS with Apache

http.conf

Page 25: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

25BioCASe Workshop, Melbourne, Feb 4-5th 2010

6. Testing BPS, Installing Additional Packages

http://localhost/biocase Utilities Library Test

Page 26: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

26BioCASe Workshop, Melbourne, Feb 4-5th 2010

6a: mysqldb

http://sourceforge.net/projects/mysql-python/

Page 27: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

27BioCASe Workshop, Melbourne, Feb 4-5th 2010

7. Write permissions

... /bps2/configuration

Page 28: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

28BioCASe Workshop, Melbourne, Feb 4-5th 2010

Changing the Password

... /bps/configuration.ini

Page 29: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

29BioCASe Workshop, Melbourne, Feb 4-5th 2010

3: ABCD, HISPID

Protocols/Data Standards

Wrapper Software

Data Quality CheckerDataMining

Applications

Data Portal

Page 30: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

30BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD Data Schema

Access to Biological Collection Data:

Data schema for all types of primary biodiversity data (living/preserved/observational, botanical/zoological/bacterial/viral, marine/terrestrial)

XML (eXtensible Markup Language) based can be consumed by humans and machines

Highly complex, hierarchical, currently 1,055 data elements almost every data item will fit in

Extendable (plug-in slot for additional information)

standard (currently version 2.06)

Page 31: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

31BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Structure

Namespace: http://www.tdwg.org/schemas/abcd/2.06

Page 32: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

32BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD Metadata: Technical/Content Contact

Page 33: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

33BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD Metadata: Intellectual Property Rights

Page 34: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

34BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD Metadata: Representation, Owner, ...

Page 35: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

35BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Triple ID, Record Basis

Page 36: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

36BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Identification (multiple)

Page 37: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

37BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Gathering Event

Page 38: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

38BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Multimedia

Page 39: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

39BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Unit Associations

Page 40: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

40BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: Specialised Portions

Specimen Unit: Acquisition, Accession, Peparation, Duplicate Distribution, Type Status

Herbarium Unit: Loan Information

Botanical Garden Unit: Location in Garden, Hardiness, Lineage, Cultivation, Planting Date

Other Specialised Subtrees forObservationsCulture CollectionsMycological UnitsZoological UnitsPaleontological UnitsPlant Genetic Resources

Page 41: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

41BioCASe Workshop, Melbourne, Feb 4-5th 2010

ABCD: UnitExtension

Own Namespace for Extension http://www.chah.org.au/schemas/hispid/5

Other Extensions: Extension for Geoscienes (ABCD-EFG) DNA Bank Network (ABCD-DNA)

Page 42: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

42BioCASe Workshop, Melbourne, Feb 4-5th 2010

HISPID

HISPID Gathering-Coordinates DMS-PersonCollector-Substrate/ParentRock-SoilType, Vegetation

HISPID Unit-LifeForm, Phenology-NonComputerisedDataFlag-DonorTyp, ProvenanceType

HISPID Identification

-HigherTaxon: Addition ranks

Page 43: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

43BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Protocol

Biological Collection Access Service Protocol:

Manages data exchange between data providers (collections) and applications (data portals)

Vehicle for transporting requests: data portal collection and responses (ABCD documents): collection database data portal

XML based

Page 44: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

44BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Protocol: Capabilities request

Page 45: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

45BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Protocol: Inventory Request

Page 46: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

46BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe Protocol: Search Request

Page 47: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

Beispielbild

4. Preparing the database for BioCASe

Page 48: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

48BioCASe Workshop, Melbourne, Feb 4-5th 2010

4. Reasons for not publishing the live DB

1. Publishing the live DB is not desired creating snapshots for publication

2. DBMS not accessible for the BPS export into another DBMS

3. Performance considerations (too highly normalized) partial, controlled denormalization

4. Repeatable elements kept in columns, not in separate rows Moving repeatable elements to separate records

Page 49: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

49BioCASe Workshop, Melbourne, Feb 4-5th 2010

Each repeatable elements needs its own primary key!

Repeatable elements kept in columns

specimen_id ... class order family

3476 ... Conjugatophyceae Desmidiales Desmidiaceae

3477 ... Conjugatophyceae Desmidiales Desmidiaceae

3478 ... Conjugatophyceae Desmidiales Closteriaceae

specimen_id ...

3476 ...

3477 ...

3478 ...

sp_id ht_entry ht_rank ht_name

3476 456765 class Conjugatophyceae

3476 456766 order Desmidiales

3476 456767 family Desmidiaceae

3477 456768 class Conjugatophyceae

3477 456769 order Desmidiales

3477 456770 family Desmidiaceae

3478 456771 class Conjugatophyceae

3478 456772 order Desmidiales

3478 456773 family Closteriaceae

Page 50: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

50BioCASe Workshop, Melbourne, Feb 4-5th 2010

CREATE VIEW [dbo].[vwHigherTaxa]

AS

SELECT 'k_' + [EDIT_ATBI_RecordID] AS id, [EDIT_ATBI_RecordID] AS unit_id, [kingdom] AS name, 'kingdom' AS rankFROM unit_dataWHERE [kingdom] IS NOT NULL

UNION

SELECT 'p_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [phylum], 'phylum'FROM unit_dataWHERE [phylum] IS NOT NULL

UNION

SELECT 'c_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [class], 'class'FROM unit_dataWHERE [class] IS NOT NULL

UNION

...

Example View

Page 51: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

51BioCASe Workshop, Melbourne, Feb 4-5th 2010

Commonly used repeatable elements

- Identification- HigherTaxon- GatheringSite/NamedArea- Metadata/Scope/GeoecologicalTerms- Metadata/Scope/TaxonomicTerms- MultimediaObjects- MeasurementsOrFacts- ...

Page 52: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

52BioCASe Workshop, Melbourne, Feb 4-5th 2010

Controlled Denormalization

insert into [dbo].[abcd_Object]

SELECT dbo.CollectionObject.CollectionObjectID, ISNULL(dbo.CatalogSeries.SeriesName, '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.SubNumber AS nvarchar(20)), '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.CatalogNumber AS nvarchar(20)), ''), dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID), dbo.f_getCollectingEventID(dbo.CollectionObject.CollectionObjectID), dbo.f_getFieldNumber(dbo.CollectionObject.CollectionObjectID), cast(dbo.CollectionObjectCatalog.CatalogNumber as int), dbo.CollectionObject.PreparationMethod, case when Sex = '<No Data>' then NULL else Sex end, case when Stage = '<No Data>' then NULL else Stage end, case when dbo.CollectionObject.Text1 is null then '' else 'Barcode: ' + dbo.CollectionObject.Text1 + '; ' end + case when dbo.Accession.Number is null then '' else 'Specimen Location: ' + dbo.Accession.Number end + case when DerivedFrom.Remarks is null then '' else ' <br> ' + cast(DerivedFrom.Remarks as nvarchar(2000)) end

FROM dbo.BiologicalObjectAttributes RIGHT OUTER JOIN dbo.CollectionObject ON dbo.BiologicalObjectAttributes.BiologicalObjectAttributesID = dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID)

LEFT OUTER JOIN dbo.CollectionObjectCatalog LEFT OUTER JOIN dbo.CatalogSeries ON dbo.CollectionObjectCatalog.CatalogSeriesID = dbo.CatalogSeries.CatalogSeriesID ON dbo.CollectionObject.CollectionObjectID = dbo.CollectionObjectCatalog.CollectionObjectCatalogID

LEFT JOIN dbo.Accession on Accession.AccessionID = CollectionObjectCatalog.AccessionID

LEFT JOIN dbo.CollectionObject AS DerivedFrom ON CollectionObject.DerivedFromID = DerivedFrom.collectionObjectID

WHERE (dbo.f_hasChildObjects(dbo.CollectionObject.CollectionObjectID) = 0) AND ...

Page 53: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

53BioCASe Workshop, Melbourne, Feb 4-5th 2010

How Do I See Someting is Wrong?

Errors in ABCD documents:-Several datasets (one for each unit)-Several units for one specimen record

Reasons:- Repeatable elements not in separate tables (no separate PK several units will be created)- Several records in DB for non-repeatable elements (several ABCD objects are necessary to create a valid document)

Page 54: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

Beispielbild

5. Setting Up a BioCASe Data Source:Database connection, Table Setup, Schema Mapping

Page 55: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

55BioCASe Workshop, Melbourne, Feb 4-5th 2010

BPS Datasource

URL for a BioCASe protocol compliant webservice:http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=AlgenEngels

<?xml version='1.0' encoding='UTF-8'?><request xmlns='http://www.biocase.org/schemas/protocol/1.3'> <header> <type>search</type> </header> <search> <requestFormat>http://www.tdwg.org/schemas/abcd/2.06</requestFormat> <responseFormat start='0' limit='10'> http://www.tdwg.org/schemas/abcd/2.06</responseFormat> <filter> <like path='/DataSets/DataSet/Units/Unit/Identifications/Identification/ Result/TaxonIdentified/ScientificName/FullScientificNameString'>A*</like> </filter> <count>false</count> </search></request>

Page 56: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

56BioCASe Workshop, Melbourne, Feb 4-5th 2010

BPS QueryTools

Tool for sending Scan, Search and Capabilities Requests to a datasource

Choose datasource „Debug“

Page 57: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

57BioCASe Workshop, Melbourne, Feb 4-5th 2010

Steps for Setting Up a Datasource

1. Create a new Datasource

2. Configure Datasource:1. Database Connection

2. Table Setup

3. Create new empty Mapping

4. Edit Mapping:

1. Choose root table

2. Edit mandatory ABCD elements (red)

3. Save Configration, test datasource (QueryTools)

4. Add additional ABCD elements, occasional testing

3. Test Datasource

Page 58: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

58BioCASe Workshop, Melbourne, Feb 4-5th 2010

Datasource Loglevel

The lower the loglevel, the more information is logged.(10=info, 20=debug, 30=warning, 40=error)

Page 59: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

59BioCASe Workshop, Melbourne, Feb 4-5th 2010

How The BPS performs requests

1. Get a ID list of records matching the filter

2. Loading all details for the matching IDs Joining of ALL tables, beginning with the root table (table with UnitID, one record per Unit)

Page 60: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

60BioCASe Workshop, Melbourne, Feb 4-5th 2010

Datasources folder

... /bps/configuration/datasources/<dsname>

querytool_prefs.xmlJust what its name says.

xxxx.pickTemporary files; should be deleted if BPS behaves strangely.

cmf_xxxxxx.xmlConcept mapping; one for each supported schema.

provider_setup_file.xmlDatabase conncetion, table setup, supported schemas.

Regular backup of configuration folder is highly recommended!

Page 61: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

61BioCASe Workshop, Melbourne, Feb 4-5th 2010

Metadata tables

If metadata differ for each or some of the records: several records in metadata table, linked to unit by foreign key

If metadata is unique for all records possible to hold data in one record no reference key is needed static table

Page 62: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

62BioCASe Workshop, Melbourne, Feb 4-5th 2010

Applications

2. Wrapper Software

1. Protocols/Data Standards

Data Quality CheckerDataMining

3. Applications

Data Portal

Page 63: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

63BioCASe Workshop, Melbourne, Feb 4-5th 2010

Local QueryTool

Page 64: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

64BioCASe Workshop, Melbourne, Feb 4-5th 2010

Distibuted Search vs. Harvesting/Caching

GeoCASe Distributed Search: http://search.biocase.org/geocase

Page 65: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

65BioCASe Workshop, Melbourne, Feb 4-5th 2010

GBIF Registration

Page 66: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

66BioCASe Workshop, Melbourne, Feb 4-5th 2010

GBIF Data Portal

Page 67: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

67BioCASe Workshop, Melbourne, Feb 4-5th 2010

BioCASe European data portal

Page 68: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

68BioCASe Workshop, Melbourne, Feb 4-5th 2010

EDIT Specimen Explorer

Page 69: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

69BioCASe Workshop, Melbourne, Feb 4-5th 2010

Data Mining: Itineraries Project

Goal: Detect itinerary patterns in geo-referenced primary data presumably collected during a collecting event.

1st step:

Try to validate itineraries from well-documented expeditions (literature) against geo-referenced primary biodiversity records with dates/collecting information

2nd step:

Try to find itineraries for collecting events with missing expedition diaries

Page 70: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

70BioCASe Workshop, Melbourne, Feb 4-5th 2010

Data Mining: Ecological Niche Modelling

Page 71: Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

71BioCASe Workshop, Melbourne, Feb 4-5th 2010

Jörg Holetschek

Botanischer Garten & Botanisches MuseumAbteilung Biodiversitätsinformatik & Labors

Königin-Luise-Straße 6-814195 Berlin-Dahlem

[email protected]. +49 30 838 50150

0448 831 980

www.bgbm.org/biodivinf

www.biocase.orgsearch.biocase.orgsearch.biocase.de

http://hiscom.chah.org.au/wiki/BioCASe_Workshop