Semantic search within Earth Observation products databases based on automatic tagging of image...

Preview:

DESCRIPTION

Since 1972 and the launch of Landsat 1– the first Earth Observation civilian satellite - millions of images have been acquired all over the Earth by a constantly growing fleet of more and more sophisticated satellites. Generally, searching within this huge amount of Earth Observation (EO) images is limited by the description of the acquisition conditions stored in the related metadata files, i.e. Where (footprint), When (time of acquisition) and How (viewing angles, instrument, etc.). Thus the larger community of end users misses the What filter - i.e. a way to filter search in term of image content. RESTo [1] uses the iTag [2] footprint-based tagging system to enhance image metadata and hopefully provides a way to express semantic queries on images content in term of land use. We investigated the performance of RESTo against a 12 millions simulated Sentinel-2 granules database representative of the forthcoming French national mirror site of Sentinel products (PEPS).

Citation preview

Jérôme Gasperi

2014 Conference on Big Data from Space Frascati - Italy - November 12th, 2014

SEMANTIC SEARCH WITHIN EARTH OBSERVATION PRODUCTS DATABASES BASED ON AUTOMATIC TAGGING OF IMAGE CONTENT

Big Data ? The data deluge

The search paradigm

iTag An EO tagging library

resto An EO product search engine

What’s next ? Conclusion and perspectives

Brett Ryder - http://www.economist.com/node/15579717

The data deluge

Earth Observation products search paradigm is to use the acquisition parameters stored in the metadata

When Where How

What ?i.e. image content

Sven Sachsalber | http://www.palaisdetokyo.com/fr/events/sven-sachsalber

Automatic tagging of Earth Observation products

iTag

Orthorectified image Characterized image

This is urban

This is water

This is forest

What we got What we need

iTag provides semantic enhancement of Earth Observation data

It uses metadata footprint to enrich metadata from exogenous data

i.e. no image processing !

Out of the box tagging sources Continents, Countries, Regions, States, Cities, Land cover, Rivers, Population count

# Polygon around Moscow$moscow = ‘POLYGON((37.1351 55.9655,38.1006 55.9640,38.0525 55.4969,37.0926 55.5171,37.1351 55.9655))’;

# Initialize iTag$iTag = new iTag();

# Tag polygon for land cover$result = $iTag->tag($moscow, array(

‘landcover’ => true));

Tag footprint around Moscow

http://goo.gl/6AkU4y

github.com/jjrom/itag

restoToward an Earth Observation products search engine

Search, visualize and download Earth Observation data

Architecture

Query AnalyzerGazetteer

Administration

RES

T W

ebse

rvic

es

Abs

trac

t D

atab

ase

Acc

ess

Laye

r

PostgreSQL Driver

iTag 2.0

resto 2.0

Search

Visualize

Download

Users

DELETE

POST

Admin

Data

Abs

trac

t D

atab

ase

Acc

ess

Laye

r

PostgreSQL Driver

databaseresto

schema_collection1

schema_collection2

…etc…

schemaresto

schemausersmanagement

PostGIShstoreTable inheritance

Rresto

IngestSearch

POSTGET

Ingest

Query AnalyzerGazetteer

Administration

RES

T W

ebse

rvic

es

Abs

trac

t D

atab

ase

Acc

ess

Laye

r

PostgreSQL Driver

iTag 2.0

resto 2.0

Search

Visualize

Download

Users

DELETE

POST

Data

During ingestion process, resources are automatically tagged thanks to iTag library

Why to tag image first ?

Search images o

ver Russia

Bounding box !!

Search

resto provides semantic search capabilitiesIt uses a Query Analyzer to translate natural language query into a set of EO OpenSearch parameters

<with> "keyword"<without> "keyword"

"quantity" <lesser> (than) "numeric" "unit""quantity" <greater> (than) "numeric" "unit""quantity" <equal> (to) "numeric" "unit"<lesser> (than) "numeric" "unit" (of) "quantity"<greater> (than) "numeric" "unit" (of) "quantity"<equal> (to) "numeric" "unit" (of) "quantity""quantity" <between> "numeric" <and> "numeric" ("unit")<between> "numeric" <and> "numeric" "unit" (of) "quantity"

<today><yesterday><before> "date"<after> "date"<between> "date" <and> "date""numeric" "(year|day|month)" <ago><last> "(year|day|month)"<last> "numeric" "(year|day|month)""numeric" <last> "(year|day|month)""(year|day|month)" <last><since> "numeric" "(year|day|month)"<since> "month" "year"<since> "date"<since> "numeric" <last> "(year|day|month)"<since> <last> "numeric" "(year|day|month)"<since> <last> "(year|day|month)"<since> "(year|day|month)" <last>

Query string analysis algorithm is based on simple recognition of words and patterns

« Images of urban area in Russia acquired in last year with less than 5 % of cloud cover »

Example

Example

keyword location date acquisition parameter« Images of urban area in Russia acquired in last year with less than 5 % of cloud cover »

2. Each search result has an « human readable url » that can be indexed by web crawler (i.e. google robots)

1. Search parameters are derived from Natural Language query

3. Keywords on resources are links to search requests : they can be indexed by web crawler…and so on

2. Each search result has an « human readable url » that can be indexed by web crawler (i.e. google robots)

1. Search parameters are derived from Natural Language query

3. Keywords on resources are links to search requests : they can be indexed by web crawler…and so on

http://goo.gl/BCZ3z4

As of version 2.0, resto supports faceted search

http://dinosaurs.wikia.com/wiki/Coelurosauria

Facets

PerformancesiTag / resto

Time period of 1 month within a 10x10 km2 box

SEARCH

INGEST

0.2s

0.5s

1 000 000SPOT DATABASE

New products retrieved every 3 hours from ADS catalog

Per product for a ~5000 products ingestion

Order of magnitude compute on a Dual Core 2.6 GHz | 4 Go RAM | HDD 500 To

What’s next ?Conclusion and perspectives

Need for « fresh » tagging reference databases(e.g. GLC2000 replacement)

Enhance metadata with twitter trends hashtagsAdd tags #mh370,#plane,#malaysianairline to resources acquired between 2014, march 8th and 2014, april 14th in the south of the Indian Ocean

« Linked data is the right way to do Semantic Web »Tim Berners-Lee

Update iTag JSON model to follow JSON-LD format{ "@context": "http://json-ld.org/contexts/person.jsonld", "@id": "http://dbpedia.org/resource/John_Lennon", "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" }