25
Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK) Adenauerallee 150-164 53113 Bonn, Germany

Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Embed Size (px)

Citation preview

Page 1: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Increasing Usability of Biodiversity Databases through

Semantic Enrichment

Klaus Riede Zoologisches Forschungsinstitut &

Museum Alexander Koenig (ZFMK)

Adenauerallee 150-164

53113 Bonn, Germany

Page 2: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Semantic Enrichment:Some examples.....

Huge Biodiversity Databases already exist.

They cover distinct organims:

Fishbase, Orthoptera Species File

OR

Distinct themes:

Threat: IUCN Red List Database (www.redlist.org)

Migration: Global Register of Migratory Species (www.groms.de)

Why do we need semantic enrichment?

Page 3: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Semantic Enrichment:Some examples.....

Try to search for:

Number of „Extinct Tropical Timber Trees“

Database: IUCN Red List Database (www.redlist.org)

Query: Tropical tree

Problem: plants are not classified according to life-form

Plant families such as TAXODIACEAE comprise trees

(e.g. Taiwania cryptomeroides - VULNERABLE)

CUPRESSACEAE contain shrubs (Actinostrobus) AND trees ( Thuja spp.)

Page 4: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Semantic Enrichment:Searching for Red-Listed Trees

To search the IUCN Red List Database (www.redlist.org)

for „Threatened“ trees, you have to know plant taxonomy:

Searching the Order CONIFERALES (containing Taxodiaceae trees):–16 Critically Endangered,

–43 Endangered,

–93 Vulnerable,

...but some of those are shrubs (Cupressaceae: Actinostrobus)

Threatened Cupressaceae:– 2 Critically Endangered, (e.g. Thuja sutchuensis)

– 15 Endangered, (e.g. Juniperus cedrus)

– 25 Vulnerable (e.g. Cupressus gigantea)

Page 5: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Semantic enrichment is necessary to search for „Trees“

http://www.botanik.uni-bonn.de/conifers/index.htm

Page 6: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Two Worlds: Relational databases and complex data sets

Relational Databases

Digital Orthoptera Specimen Access

SYSTAX

GROMS Global Register of Migratory Species

Complex data sets

Sounds, Pictures

gene sequences (links) geographic coordinates

Maps (GIS-data: shapes)

Page 7: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Example #1Data-mining for Knowledge Gaps

The „Global Register of Migratory Species“ Databasecontains literature citations on migration.

Knowledge gaps were detected by searching for text strings such as: poor* , little known, unknown

www.groms.de

Page 8: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

The relational organisation of the GROMS database allows application of SQL queries for text-mining:

ID Author, Title etc

Lit_IDSpecies_IDText:[......................migration... unknown...................................]

IDTaxon nameMigrationRed List status, etc

References Table: Joint Table: Species Table

1:manymany:1

5,500 entries 8,500 entries 4,355 entries

Many:Many relation connects References and Species Names

Page 9: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

SQL statement:Searching for non-passerine birds with poorly known

migration behaviour:

Apus caffer White-rumpedswift

intercontinental Migratory in northernmost and southernmost parts of range. Spanish population present early May to Aug-Oct, some recorded intoearly Dec, with autumn migration through Straits of Gibraltar mid-Aug to mid-Oct; S African population present Aug-May, mainlyabsent from S Cape and much reduced farther N within S breeding range Jun-Jul. Poorly understood wet-season movements into Sahelmay be feature of N sub-Saharan populations. Otherwise resident. Migrates in flocks of up to 100. S African migrants may betransequatorial. Some degree of altitudinal migration in Natal. First record from rabia 1982, and seen at least once subsequently inTihamah coastal plains, Saudi Arabia, in Mar 1989. Vagrant to Norway (May, Jun) and Finland (Nov).

Chaetura vauxi Vaux's swift intercontinental Nominate race a migrant, present in far N of range May to mid-Sept, exceptionally late Mar on coast. Migrates through S Californiamid-Apr to early May, with weaker autumn passage peaking early Sept, though continuing to early Oct, migrants leaving the state bymid-Oct. Recorded SE Farallon Is, 42 km W of San Francisco, in similar numbers over 22 years, in spring 813 in early-late May, and inautumn 803 early Sept to late Oct. Recorded E to Louisiana and Florida Passage through NW Mexico Apr-May and mid-Sept to Oct;nominate race present C Mexico to W Honduras, mid-Sept to May. Incidence of wintering in California increasing, small flocksoccurring mainly in S, though wintering as far as NW California not unknown.

SELECT Tab_Arten.Latein, Tab_Arten.Englisch, Tab_Arten.Migration,Jointab_Art_Lit.Lit_Bezug, Tab_Literatur.Autor_Name, Tab_Literatur.Autor_Vorname,Tab_Literatur.Coautoren_Namen, Tab_Literatur.Jahr, Tab_Arten.animalgroup,Tab_Arten.FamilieFROM Tab_Literatur RIGHT JOIN (Tab_Arten INNER JOIN Jointab_Art_Lit ONTab_Arten.ID = Jointab_Art_Lit.ID_Art) ON Tab_Literatur.ID = Jointab_Art_Lit.ID_LitWHERE (((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND((Jointab_Art_Lit.Lit_Bezug) Like "*unknown*")) OR (((Jointab_Art_Lit.Theme)=7) AND((Tab_Arten.Animal_Class)=2) AND ((Jointab_Art_Lit.Lit_Bezug) Like "*perhaps*")) OR(((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND((Jointab_Art_Lit.Lit_Bezug) Like "*little*")) OR (((Jointab_Art_Lit.Theme)=7) AND((Tab_Arten.Animal_Class)=2) AND ((Jointab_Art_Lit.Lit_Bezug) Like "*poor*")) OR(((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND((Jointab_Art_Lit.Lit_Bezug) Like "*possib*"))ORDER BY Tab_Arten.animalgroup, Tab_Arten.Familie;

Page 10: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Result: 349 birds with unsufficiently known migration behaviour

Apus caffer

White-rumped swift

Migratory in northernmost and southernmost parts of range.Spanish population present early May to Aug-Oct, somerecorded into early Dec, with autumn migration through Straitsof Gibraltar mid-Aug to mid-Oct; S African population presentAug-May, mainly absent from S Cape and much reducedfarther N within S breeding range Jun-Jul. Poorly understoodwet-season movements into Sahel may be feature of N sub-Saharan populations. Otherwise resident. Migrates in flocks ofup to 100. S African migrants may be transequatorial. Somedegree of altitudinal migration in Natal. First record from rabia1982, and seen at least once subsequently in Tihamahcoastal plains, Saudi Arabia, in Mar 1989. Vagrant to Norway(May, Jun) and Finland (Nov).

Caprimulgusclimacurus

Long-tailed nightjar

Poorly known. Nominate race migratory and partiallysedentary, some populations moving S after breeding season.Race sclateri possibly sedentary and partially migratory.Race nigricans probably sedentary. Outside breedingseason, range also includes S Ivory Coast, SW Nigeria, SCameroon, Equatorial Guinea, Gabon, SE Congo (lowerCongo river valley), NE Angola (one record Luaco), SESudan, SW Ethiopia, W Kenya (sporadic in Turkana andPokot region) and E Uganda.

mainly based on „Handbook of the birds of the World (del Hoyo et al. 1992-2003

www.groms.de

Page 11: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Example #2:Automatic Annotation of Sound Parameters

The Orthoptera Song Repository of the DORSA project was used to annotate all 5,000 sound files automatically with sound parameters.

Sound parameters were added to the SysTax database, which stores specimen data from various museum databases, including herbaria.

The annotated SysTax Oracle database is now searchable for sound parameters, such as Carrier Frequency and Pulse Rate

Page 12: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Orthopteren-Typenmaterial in deutschen Museen.

Deutsche Orthopteren Sammlungen - www.dorsa.de

Page 13: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Deutsche Orthopteren Sammlungen - www.dorsa.de

Überprüfung, Bestimmung, Verifizierung von•Angaben über Typenmaterial, •Auffinden „historischer“ Typen,•Festlegung von Lektotypen

Page 14: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Deutsche Orthopteren Sammlungen - www.dorsa.de

Taxonomic database(OSF:

Orthoptera Species File, USA)

Specimens (german museums, phonotheks)

(www.dorsa.de)

Mutual links

Page 15: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Extraction of sound parameters by using MatLab Software

Pulse rate

Carrier frequency

Carrier frequency

In cooperation with:Dept of Neuroinformatics, Ulm

Page 16: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Enriched sound file table:pulse distance, length, frequency etc were added to the

SYSTAX table

PULSEDISTPULSEDISTSPULSELENGTPULSELENGTDUTY_CYCLEFREQUENCYFREQUENCYSFILENAME12.5 ms 419.8 10.5 ms 2.6 0.84 7590.44 Hz 66.60 n3/tr/trigsp01/s6n023.wav19.3 ms 94.8 7.0 ms 2.2 0.36 7593.24 Hz 63.79 n1/tr/trigsp01/s6n023.wav18.1 ms 453.3 10.6 ms 4.7 0.59 7357.79 Hz 521.05 n3/tr/trigsp01/s6n027.wav18.1 ms 290.4 10.3 ms 5.2 0.57 7302.54 Hz 610.23 n1/tr/trigsp01/s6n027.wav18.1 ms 983.1 10.4 ms 3.2 0.57 7684.25 Hz 114.84 n1/tr/trigsp01/s6n027f.wav13.2 ms 203.2 6.6 ms 3.3 0.50 7104.88 Hz 76.85 n3/tr/trigsp01/s6n029.wav13.6 ms 79.1 12.0 ms 3.5 0.88 7128.05 Hz 78.00 n1/tr/trigsp01/s6n029.wav11.5 ms 458.2 6.1 ms 3.1 0.53 7702.09 Hz 380.65 n3/tr/trigsp01/s6n031.wav11.6 ms 113.8 5.7 ms 2.6 0.49 7806.72 Hz 78.67 n1/tr/trigsp01/s6n031f.wav22.9 ms 130.4 8.4 ms 2.6 0.37 6867.13 Hz 77.54 n3/tr/trigsp01/s6n034.wav22.9 ms 171.9 8.4 ms 2.6 0.37 6855.36 Hz 90.63 n1/tr/trigsp01/s6n034.wav22.9 ms 126.7 8.4 ms 2.6 0.37 6855.70 Hz 102.59 n1/tr/trigsp01/s6n034f.wav14.4 ms 114.3 11.7 ms 2.6 0.81 6672.19 Hz 53.27 n3/tr/trigsp01/s6n041.wav14.6 ms 209.3 11.7 ms 3.2 0.80 6641.43 Hz 60.03 n1/tr/trigsp01/s6n041.wav14.6 ms 209.3 11.7 ms 3.2 0.80 6643.62 Hz 70.45 n1/tr/trigsp01/s6n041f.wav39.4 ms 1988.8 19.5 ms 7.3 0.49 6165.52 Hz 124.48 n3/tr/trigsp01/s6n044.wav13.0 ms 100.7 11.0 ms 2.0 0.85 7207.58 Hz 41.06 n1/tr/trigsp01/s6n044f.wav13.2 ms 295.5 11.0 ms 4.0 0.83 6965.09 Hz 1129.63 n3/tr/trigsp01/s6n049.wav13.0 ms 100.0 8.5 ms 2.4 0.65 7205.56 Hz 41.95 n1/tr/trigsp01/s6n049.wav11.5 ms 506.9 9.8 ms 2.9 0.85 7528.76 Hz 64.86 n1/tr/trigsp01/s7n008.wav11.5 ms 82.3 10.0 ms 2.7 0.87 7545.55 Hz 51.77 n3/tr/trigsp01/s7n008.wav11.5 ms 506.9 9.8 ms 2.9 0.85 7527.69 Hz 63.26 n1/tr/trigsp01/s7n008f.wav13.2 ms 148.8 11.0 ms 2.7 0.83 7322.22 Hz 66.56 n3/tr/trigsp01/s7n026.wav13.5 ms 1586.9 11.2 ms 3.7 0.83 7330.96 Hz 58.02 n1/tr/trigsp01/s7n026.wav13.5 ms 1581.0 11.2 ms 3.7 0.83 7332.97 Hz 44.75 n1/tr/trigsp01/s7n026f.wav17.8 ms 174.6 11.0 ms 2.9 0.62 7464.91 Hz 61.32 n3/tr/trigsp01/s7n031.wav17.7 ms 123.3 10.4 ms 3.3 0.59 7467.31 Hz 54.61 n1/tr/trigsp01/s7n031.wav17.7 ms 123.3 10.4 ms 3.3 0.59 7462.75 Hz 55.57 n1/tr/trigsp01/s7n031f.wav13.7 ms 172.4 7.3 ms 2.6 0.53 7325.90 Hz 41.79 n3/tr/trigsp01/s7n053.wav

Page 17: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Bioacoustic, automatised classification of ethospecies

allows Rapid Assessment

Mapping with microphones allows to answerimportant research questions, such as:

- species ranges/ endemism- species abundance- species turnover- community patterns- activity patterns- vulnerability to habitat degradation- extermination rates

Page 18: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Example #3Enriching databases with Geographic information

- Adding lat-lon coordinates by Geo-referencing

- GIS Analysis of complex geometries (shapes) by intersection with other GIS-layers and subsequent update

Page 19: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Georeferencing is necessary to update place names withlat-lon data

Label Country long_dec lat_dec

Ahwaz Iran 48.66667 31.16667Ainazi-Svetupe River Latvia 24.3 57.78333Ainovy Islands Russia 31.58333 69.83334Akaki Region Ethiopia 38.83333 8.833333Akh-Chala Plavni Novogolov Azerbaijan 48.66667 39.5Akhna Dam Cyprus 33.8 35.03333Akhtarski and Sladki Limans Russia 38 46Akrotiri Salt Lake Cyprus 32.93333 34.51667Aksehir Gölü Turkey 31.4 38.55Akureyri Iceland -18.08333 65.66666Akyatan Gölü Turkey 35.31667 36.58333

?

Page 20: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Geographic coordinates were added to place names, using Times Atlas or gazetteers (Getty, Alexandria Project)

Label Country long_dec lat_dec

Ahwaz Iran 48.66667 31.16667Ainazi-Svetupe River Latvia 24.3 57.78333Ainovy Islands Russia 31.58333 69.83334Akaki Region Ethiopia 38.83333 8.833333Akh-Chala Plavni Novogolov Azerbaijan 48.66667 39.5Akhna Dam Cyprus 33.8 35.03333Akhtarski and Sladki Limans Russia 38 46Akrotiri Salt Lake Cyprus 32.93333 34.51667Aksehir Gölü Turkey 31.4 38.55Akureyri Iceland -18.08333 65.66666Akyatan Gölü Turkey 35.31667 36.58333

Page 21: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Mapping requires specimen data enriched with geographic coordinates

The DORSA mapserver is available atwww.dorsa.de

Page 22: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Deutsche Orthopteren Sammlungen - www.dorsa.de

Herkunftsländer des Typenmaterials in deutschen Museen

Page 23: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Example #3Enriching databases with Geographic informationbased on GIS calculation of range territories

Distribution maps (shapes) are available at www.groms.de

Page 24: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Import of Intersection Results:1,000 mapped species - 2,522 administrative units 340,000 combinations (dbf attribute table:province – species)

Queensland search results:

Page 25: Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK)

Summary:Semantic enriching of relational databases is possible by automatic annotation:

Relational database

Table with annotation Results

Enriched RelationalDatabase

External data set(sounds, GIS)

Link

Running annotationprogram (eg GIS intersection

ImportingResult table

Enrichment allows SQL retrieval of complex data parameters