Loupe model - Use Cases and Requirements

Preview:

Citation preview

LOUPE’S MODEL

USE CASES AND REQUIREMENTS

Nandana Mihindukulasooriya, María Poveda Villalón,

Raúl García CastroOntology Engineering Group. Departamento de Inteligencia Artificial.

Facultad de Informática, Universidad Politécnica de Madrid.

Campus de Montegancedo s/n.

28660 Boadilla del Monte. Madrid. Spain

{nandana, mpoveda, rgarcia}@fi.upm.es

Introduction to Loupe

2

Loupe - Overview

3

Explore the vocabularies used and the abstract triple patterns in 5+ billion triples including all Dbpedia datasets, Wikidata, Linked Brainz, Bio2RDF.

Loupe helps to understand data, uncover patterns, formulate queries, and detect quality issues

Loupe - Overview

4

Explore the vocabularies used and the abstract triple patterns in 5+ billion triples including all Dbpedia datasets, Wikidata, Linked Brainz, Bio2RDF.

Loupe helps to understand data, uncover patterns, formulate queries, and detect quality issues

No RDF data, No Public API

Loupe - Google Analytics

5

• Users from 86 countries

• Spain(23.76%), US (16.69%), Germany

(10.64%), UK (9.14%), Italy (4.51%)

Next Steps

6

Louping the LOD Cloud

7

Loupe – LOD Laundromat integration

8Nandana Mihindukulasooriya, OEG

• LOD Laundromat

• 32 billion triples from 650K documents

• cleaned for syntax errors and duplicates

• coverage of smaller documents

• Collaboration with VU University Amsterdam

• Indexing all data from LOD Laundromat

Use CasesWhat can we do with data indexed in Loupe?

9

Dataset descriptions

• Bridge between publishers and consumers

• A dataset description expresses metadata about

RDF datasets (e.g., DCAT, VoID)• statistics, vocabularies, structural metadata.

• A dataset profile is a set of dataset

characteristics that allow • To describe in the best possible way a dataset

• To separate it maximally from other datasets

• Can be used for dataset recommendation

10

Dataset Statistics

11

UC::ex1 - Compare dataset statistics (I)

12

DBpedia (2015-04) datasets

Size (in # of triples)

UC::ex1 - Compare dataset statistics (II)

13

# of Classes Used

DBpedia (2015-04) datasets

UC::ex2 - Monitor evolution of a dataset

14

Vocabulary Usage - Classes

15

Classes

Classes Properties

# of classes per vocabulary

Common instances

dbo:Place class

esDBpedia dataset

UC::ex3 - Dataset summary generation

16

Auto-generated dataset schemaVisual descriptions

foaf:Person

openaire:result

foaf:Organization

xsd:Stringfoaf:firstName

openaire:isAuthorOf

xsd:String

foaf:lastName

xsd:String

xsd:String

xsd:String

dcterms:dateAccepted

openaire:resultType

dcterms:language

openaire:hasAuthorfoaf:member

xsd:boolean

xsd:boolean

xsd:boolean

openaire:legalPerson

openaire:enterprise

openaire:sme

OpenAIRE Dataset

UC::ex4 - Automatic Dataset Classification

• Generic vs Domain specific datasets• size

• number of vocabularies

• number of classes

• number of properties

• Detection of the domain using the vocabularies used• High-level domains (E.g., cross domain, life sciences,

publications, government, geographic)

17

Property Information

18

E.g., dbo:placeOfBirth property - Analysis of objects<?subject , dbo:placeOfBirth, ?object>

UC::ex5 - Quality Report Generation

• Violations• Object / datatype property violations

• Domain / range constraint violations

• Disjoint class violations

• Outlier detection

• Detection of antipatterns

• Data repair guidelines

19

UC::ex6 - Data validation with RDF Shapes

20Nandana Mihindukulasooriya, OEG

Pattern

Extraction

Domain ExpertReview

RDF Shape

Generation

Data

Validation

Data

Repair

SHACL Shapes

Multilingual String Counts

3Cixty Dataset

21

String count by language Language tagged string count by property

UC::ex6 - Dataset Discovery / Search

• Simple

• I want to find dataset(s) that

• contain information about persons with some concrete

information

• E.g., “give me datasets that have more than 500

instances of foaf:Person that have the dbo:birthPlace

property”

• Advanced

• I want to find dataset(s) that

• can answer a given sparql query

• contain data that fit to a given W3C RDF data shape

22

UC::ex7 - Dataset ranking

• Ranking metrics• Size

• number of triples (of a given pattern)

• number of instance of a given class

• Richness

• the avg number of properties per instance

• General vs Domain specific dataset

• # classes, # of properties, # triples

• Provence information

23

Ontology development UC

• Reuse ontology elements used in datasets

24

Ontology development UC

• Reuse ontology elements used in datasets

• Look for patterns

25

Ontology development UC

• Reuse ontology elements used in datasets

• Look for patterns

• Ontology reuse reports

26

Ontology development UC

• Reuse ontology elements used in datasets

• Look for patterns

• Ontology reuse reports

• Ontology monitoring

• Why some classes or properties are not used?

• Aren’t they relevant?

• Are other classes are used for the same purpose?

27

Ontology development UC

• Reuse ontology elements used in datasets

• Look for patterns

• Ontology reuse reports

• Ontology monitoring

• Why some classes or properties are not used?

• Aren’t they relevant?

• Are other classes are used for the same purpose?

• Ontology comparison reports

28

29

We want YOU

to tell us your

use cases !!

Loupe Model

30

Model

31

http://ont-loupe.linkeddata.es/def/core#

Datasets and named graphs

32

Metadata from dcat

Classes and properties

33

Classes and properties

34

Classes and properties

35

Classes and properties

36

How many instances of a given

class are there.

Classes and properties

37

How many instances of a given

class are there. < x, a, C >

Classes and properties

38

How many instances of a given

class are there. < x, a, C >

Fixed

Classes and properties

39

How many instances of a given

class are there.

CountFixed

< x, a, C >

Classes and properties

40

How many instances of a

given class that have a

given property are there.

Classes and properties

41

< x, a, C >

< x, P, o >

How many instances of a

given class that have a

given property are there.

Classes and properties

42

< x, a, C >

< x, P, o >

Fixed

How many instances of a

given class that have a

given property are there.

Classes and properties

43

< x, a, C >

< x, P, o >

CountFixed

How many instances of a

given class that have a

given property are there.

Classes and properties

44

How many triples that have

a given property are there.

Classes and properties

45

< s, P, o >

How many triples that have

a given property are there.

Classes and properties

46

< s, P, o >

Fixed

How many triples that have

a given property are there.

Classes and properties

47

< s, P, o >

Fixed

Count

How many triples that have

a given property are there.

Triple patterns

48

How many triples that have a given

subject class, property and object

class are there.

< s, P, o >

< s, a, C1 >

< o, a, C2 >

Count

Languages

49

How many strings tagged with

a given language are there.

Languages

50

How many strings tagged with

a given language are there.

< x, b, “”@lang >

CountFixed

Languages

51

How many strings tagged with

a given language are there.

< x, b, “”@lang >

CountFixed

How many triples tagged with

a given language are there.

Languages

52

How many strings tagged with

a given language are there.

< x, b, “”@lang >

CountFixed

How many triples tagged with

a given language are there.

< s,b, “”@lang >

Fixed

Count

Vocabularies

53

Classes and properties

declared in namespaces.

Questions?

54

LOUPE’S MODEL

USE CASES AND REQUIREMENTS

Nandana Mihindukulasooriya, María Poveda Villalón,

Raúl García CastroOntology Engineering Group. Departamento de Inteligencia Artificial.

Facultad de Informática, Universidad Politécnica de Madrid.

Campus de Montegancedo s/n.

28660 Boadilla del Monte. Madrid. Spain

{nandana, mpoveda, rgarcia}@fi.upm.es

Backup Slides

56

Data Catalog Vocabulary (DCAT)

57

https://www.w3.org/TR/vocab-dcat/

Vocabulary of Interlinked Datasets (VoID)

58

https://www.w3.org/TR/void/