Transforming your Graph Analytics with GraphDB (Vladimir Alexiev)

GraphDB FundamentalsOntotext Webinar Aug 11, 2016

Presentation Outline

• Welcome

• RDF and RDFS Overviews

• SPARQL Overview

• Ontology Overview

• Ontology Modeling

• GraphDB™ Installation

• Performance Tuning and Scalability

• GraphDB™ Workbench and Sesame

• Loading Data

• Rule Sets and Reasoning Strategies

• Extensions#2

Presentation Outline• Welcome

• SPARQL Overview

• Loading Data

• Extensions#3

Resource Description Framework (RDF) is a graph data model that• Formally describes the semantics, or meaning, of information

• Represents metadata, i.e., data about data

RDF data model consists of triples• That represent links (or edges) in an RDF graph

• Where the structure of each triple is Subject, Predicate, Object

Example triples:

‘br:’ refers to the namespace ‘http://bedrock/’ so that ‘br:Fred’ expands to <http://bedrock/Fred> a Universal Resource Identifier (URI).

What is RDF?

Subject Predicate Object

br:Fred br:hasSpouse br:Wilma .br:Fred br:hasAge 25 .

An Example of an RDF Model

hasSpouse

hasChild

hasChild hasChildhasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesInlivesIn

worksFor

WilmaFlintstone

PebblesFlintstone

PearlSlaghoople

RoxyRubble

PearlSlaghoople

Bamm-BammRubble

PrehistoricAmerica

CobblestoneCounty Bedrock Rock

Quarry

partOf locatedIn

FredFlinstone

BarneyRubble

BettyRubble

partOf

RDF Schema (RDFS)

• Adds− Concepts such as Resource, Literal, Class, and Datatype − Relationships such as subClassOf, subPropertyOf, domain, and range

• Provides the means to define− Classes and properties− Hierarchies of classes and properties

• Includes “entailment rules”, i.e., axioms to infer new triples from existing ones

What is RDFS?

Applying RDFS To Infer New Triplesbr:hasSpouse a rdf:Property; rdfs:domain br:Human ; rdfs:range br:Human .

br:Fred br:hasSpouse br:Wilma .br:Human a rdf:Class; rdfs:subClassOf br:Mammal .

br:Fred a br:Human .br:Wilma a br:Human .

br:Fred a br:Mammal .br:Wilma a br:Mammal .

Questions?

RDF and RDFS Overviews

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#9

What is SPARQL?

SPARQL is a SQL-like query language forRDF graph data with the following querytypes:

• SELECT returns tabular results

• CONSTRUCT creates a new RDF graph based on query results

• ASK returns ‘yes’ if the query has a solution, otherwise ‘no’

• DESCRIBE returns RDF graph data about a resource; useful when the query client does not know the structure of the RDF data in the data source

• INSERT inserts triples into a graph

• DELETE deletes triples from a graph.

SemanticSearch

Using SPARQL to Insert TriplesTo create an RDF graph, perform these steps:• Define prefixes to URIs with the PREFIX keyword

• Use INSERT DATA to signify you want to insert statements. Write the subject-predicate-object statements (triples).

• Execute this query.

:pebbles:bamm- bamm

:fred :wilma

:roxy :chip

:hasSpouse

:hasChild :hasChild

PREFIX br: <http://bedrock/>INSERT DATA { br:fred br:hasSpouse br:wilma . br:fred br:hasChild br:pebbles . br:wilma br:hasChild br:pebbles . br:pebbles br:hasSpouse br:bamm-bamm ; br:hasChild br:roxy, br:chip .}

Using SPARQL to Select TriplesTo access the RDF graph you just created, perform these steps:• Define prefixes to URIs with the PREFIX keyword.

• Use SELECT to signify you want to select certain information, and WHERE to signify your conditions, restrictions and filters.

• Execute this query.

PREFIX br: <http://bedrock/>SELECT ?subject ?predicate ?object WHERE {?subject ?predicate ?object}

Subject Predicate Object

br:fred br:hasChild br:pebblesbr:pebbles br:hasChild br:roxybr:pebbles br:hasChild br:chipbr:wilma br:hasChild br:pebbles

Using SPARQL to Find Fred’s GrandchildrenTo find Fred’s grandchildren, first find out if Fred has any grandchildren:• Define prefixes to URIs with the PREFIX keyword

• Use ASK to discover whether Fred has a grandchild, and WHERE to signify your conditions.

YESPREFIX br: <http://bedrock/>ASKWHERE { br:fred br:hasChild ?child . ?child br:hasChild ?grandChild .}

Using SPARQL to Find Fred’s GrandchildrenNow that we know he has at least one grandchild, perform these steps to find the grandchild(ren):• Define prefixes to URIs with the PREFIX keyword

• Use SELECT to signify you want to select a grandchild, and WHERE to signify your conditions.

PREFIX br: <http://bedrock/>SELECT ?grandChild WHERE { br:fred br:hasChild ?child . ?child br:hasChild ?grandChild .}

grandChild

1. br:roxy2. br:chip

SPARQL Overview

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#16

What is OntologyAn ontology is a formal specification that provides sharable and reusable knowledge representation.

Examples of ontologies include:

• Taxonomies

• Vocabularies

• Thesauri

• Topic Maps

• Logical Models

What is in an Ontology?An ontology specification includes descriptions of• Concepts and properties in a domain • Relationships between concepts • Constraints on how the relationships can be used• Individuals as members of concepts

The Benefits of an OntologyOntologies provide:• A common understanding of information• Explicit domain assumptions

These provisions are valuable because ontologies:• Support data integration for analytics• Apply domain knowledge to data• Support interoperation of applications• Enable model-driven applications• Reduce the time and cost of application development• Improve data quality, i.e., metadata and provenance

OWL Overview

The Web Ontology Language (OWL) adds more powerful ontology modelling means to RDF/RDFS• Providing

− Consistency checks: Are there logical inconsistencies?− Satisfiability checks: Are there classes that cannot have instances?− Classification: What is the type of an instance?

• Adding identity equivalence and identity difference − Such as, sameAs, differentFrom, equivalentClass, equivalentProperty

• Offering more expressive class definitions, such as− Class intersection, union, complement, disjointness− Cardinality restrictions

• Offering more expressive property definitions such as,− Object and datatype properties− Transitive, functional, symmetric, inverse properties− Value restrictions

Ontology Overview

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#22

"Ontology Development 101" by Noy & McGuinness (2001) is a popular, practical seven-step methodology for developing an ontology.

• Step 1: Identify the domain and scope

• Step 2: Consider re-using existing ontologies

• Step 3: Enumerate important terms

• Step 4: Define the classes and class hierarchy

• Step 5: Define the properties of classes

• Step 6: Define property facets

• Step 7: Create instances

A Methodology for Ontologies

To help identify the domain and scope of the ontology, answer these questions:

• What is the domain of the ontology?

• What is the purpose of the ontology?

• Who are the users and maintainers?

• What questions will the ontology answer?

Some say the last is most important (Competence Questions approach)

Step 1: Identify the Domain and Scope

Ontologies are re-usable and extensible and there are a number of existing ontologies that you might consider:

• Your existing ontology

• Widely used ontologies− such as: Dublin Core, FOAF, SKOS, Geo (WGS84)

• Upper Level Ontologies− such as: Cyc, UMBEL, DOLCE, SUMO, PROTON

• Linked Open Data

• Specialized domain ontologies

Step 2: Consider Re-using Existing Ontology

Terminology is useful for domain modeling. Start collecting terminology based on interviews and domain documentation.

Step 3: Enumerate Important Terms

To help define the class and class hierarchy, determine which type of modeling to use.

Three types of modeling are:

• Top-down modeling− Use it when the general domain concepts are known

• Bottom-up modeling− Use it when there is a great variety of concepts and no clear overarching general concepts at the outset

• Hybrid modeling− Use it when you need both top down and bottom up modeling, which is often the case

Step 4: Define Class and Class Hierarchy

Define the properties of classes, such as:

• Intrinsic properties − For example color, mass, density

• Extrinsic properties − For example, name, location

• Parts

• Relationships to other individuals

Step 5: Define Properties of Classes

Define property facets, such as:

• Property Type− Is it symmetric? Is it transitive? Is it a datatype or an object

property?

• Cardinality− Is the property optional or essential? Is the property a one-

to-many relationship?

• Domain− From which classes does this property point?

• Range− To which classes does this property point?

Step 6: Define Property Facets

Create instances of classes

• For example, :Fred a :Human

Creating instances

• Tests the domain ontology

• May expose modeling issues− which can be addressed by iterative refinement

Step 7: Create Instances

Ontology Modeling

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#32

GraphDB™ Editions

• GraphDB™ Free

• GraphDB™ Standard

• GraphDB™ Cloud

• GraphDB™ as-a-Service (S4)

• GraphDB™ Enterprise

#34http://info.ontotext.com/graphdb-free-graphdb

GraphDB™ Free Installation

To install GraphDB™ Free Edition, perform these steps:• With the new GraphDB 7 on Windows: run the installer and it starts automatically

• Otherwise: unzip, start the GraphDB and Workbench interfaces in the embedded Tomcat server by executing the startup script located in the root directory:

startup.bat (Windows)

./startup.sh (Linux/Unix/Mac OS)

The message below appears in your Terminal and the GraphDB Workbench opens up at http://localhost:8080/.

INFO: Starting ProtocolHandler [“http-bio-8080”]

Opening web app in default browser

GraphDB™ Free Edition Installation Overview

Create a new repository by:• Launching the GraphDB™ Workbench• Selecting “Admin”• Selecting “Locations and Repositories”• Configuring the new repository

GraphDB™ Free Edition Workbench New Repositoryhttp://localhost:8080

Test the repository by

• Selecting “SPARQL”

• Submitting queries

GraphDB™ Workbench Execute Querieshttp://localhost:8080

2 Query1 Insert Data

GraphDB™ Installation

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#39

With regard to performance tuning

• Memory is the most important factor− More memory results in better performance

• Configure the heap space as follows:− Set Max Heap Space to ~90% of Free Memory (-Xmx JVM parameter)

− Use entity-index-size to set the entity index size

− Cache memory indices (statements, predicates, and FTS)

Performance Tuning: Memory

on –

Java runtime overhead

Entities

POS/PSO PCSO PCOS

Predicate Lists (SP/OP)

Full-text search

RDF Rank

Geo-spatial

Lucene

Cache memory

GraphDB application heap

Total available Java heap

tuple-index-memory

predicate-memory

fts-memory

Depends on entity-index-size

Typically 12-15% of total heap

Remaining memory used by GraphDB and the application’s heap

Some of the space will be used for Caching the RDRank, geo-spatial, and Lucene indices (if enabled)

Each dataset has its own “geometry.” Technicians must gain experience with each dataset in order to refine the loading process. Here are some tips:

• Load Performance− Set ‘cache-memory’ to be 50% of max heap− Disable optional indices− Load Data in chunks of 1 million statements− Use Fast Transaction mode

• Use the new LoadRDF Parallel Bulk Loader (video, docs)

• Normal Operations After Load− Set ‘cache-memory’ to be 38% of max heap− Re-enable optional indices− Enable safe transaction mode− Experiment with

▪ cache-memory + ▪ tuple-memory-index + ▪ predicate-memory + ▪ fts-memory

Performance Tuning: Load

To help achieve the optimal configuration, GraphDB™ has a spreadsheet that estimates memory and index configuration values.

The spreadsheet

• Generates command line parameters and ttl configuration based on your input

• Is located in your distribution at ./doc called graphdb-se-configurator.xls

Performance Tuning: Spreadsheet

GraphDB™ Enterprise edition provides scalability

• Replication / High Availability cluster

• Improved concurrent querying and scalability

• Resilience for failover

Scalability: GraphDB™ Enterprise

GraphDB™

Performance Tuning and Scalability

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#46

GraphDB™ Workbench is a web-based administration tool. It is similar to Sesame Workbench, but

• Has more features

• Is more intuitive and easier to use

GraphDB™ Workbench functions Include

• Managing GraphDB™ repositories

• Loading and exporting data

• Monitoring query execution

• Developing and executing queries including Auto-complete and charting of results

• Managing connectors and users

GraphDB™ Workbench and Sesame

On the following slide is an example of the GraphDB™ Workbench screen.

• Access the GraphDB™ Workbench from a browser.

• The splash page provides a summary of the installed GraphDB™ Workbench.

• The Workbench has a menu bar and a number of convenient pull down menus organized under “Data”, “SPARQL”, “Admin”, and the currently selected repository.

GraphDB™ Workbench

Access GraphDB™ Workbenchhttp://localhost:8080/graphdb-workbench-se/

Create a new repository by selecting

• The Admin menu

• Locations and Repositories

• Create Repository

Create New Repository

By selecting the SPARQL menu, the SPARQL query editor displays and

• Allows you to render your query results as Table, Pivot Table, or Google Analytic Charts

Execute Queries With GraphDB™ Workbench

GraphDB™ Workbench Query Editor

Query Monitoring: Abort Query

GraphDB™ Workbench and Sesame

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#55

Loading data may be accomplished by using

• GraphDB™ Workbench− To upload individual files

− To upload bulk data from a directory

• LoadRDF Parallel Loader

Loading Data

Loading DataSupported File Formats

Loading data through the GraphDB WorkbenchTo load a local file:

• Select Data -> Import.• Open the Local files tab and click the Select files icon to choose the file you want to upload.• Click the Import button.• Enter the import settings in the pop-up window

Loading Local Files

Loading a database server file

• Create a folder named graphdb-import in your user home directory.• Copy all data files you want to load into the GraphDB database to this folder.• Go to the GraphDB Workbench.• Select Data -> Import.• Open the Server files tab.• Select the files you want to import.• Click the Import button.

The LoadRDF Parallel Bulk Loader

• Features fast loading of large datasets into new repositories

• Is not intended for updating existing repositories

• Is easy to use:− Enter loadrdf <config.ttl> <serial|parallel> <files...>

▪ For example “./loadrdf.sh config.ttl parallel example.ttl”

− The “Serial Load” option pipelines the parse, entity resolution, and load tasks.

− The “Parallel Load” batch processes the parse, entity resolution, and load tasks.

LoadRDF Parallel Bulk Loader

Other ways to load data

By pasting data in the Text area tab of the Import page.

By pasting a data URL in the Remote content tab of the Import page.

By executing an INSERT query in the SPARQL -> SPARQL Query page.

Loading Data

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#64

Reasoning Strategies:

• Forward Chaining− Inferences pre-computed

− Faster query performance

− Slower load times

− More memory/disk space required

− Updates are expensive (truth maintenance is non-trivial)

• Backward Chaining− Inferences performed as needed at query time

− Slower query performance

− Faster load times

• Hybrid Reasoning − Partial forward chaining at data loading time + partial backward chaining at query time

Reasoning Strategies

• GraphDB™ forward chaining/delete optimization − Fast (incremental) inserts (assertions) and deletes (retractions)− Most triplestores perform an expensive full re-compute on updates− Truth maintenance minimizes the re-compute but the required dependency tracking is expensive− GraphDB optimizes the update by using backward chaining to derive update dependencies

dynamically.− It stops at axioms or ontology triples (see onto:schemaTransaction)

• owl:sameAs forward chaining optimization− Forward chaining owl:sameAs generates a large number of triples− This is caused by statement duplication on equivalent resources− The equivalent resource optimization minimizes triples generated.− Backward-chaining can expand results at query time

GraphDB™ Reasoning Optimizations

A Rule Set Consists of• Prefixes (namespace prefixes)

• Axiomatic triples

• Custom rules

Pre-Defined Rule Sets are• empty: no reasoning, GraphDB™ operates as a plain RDF store;

• rdfs: standard RDFS semantics;

• owl-horst: RDFS + D-Entailment + Some OWL – Tractable

• owl-max: RDFS with most of OWL Lite

• owl2-rl: Conformant OWL2 RL profile except for D-Entailment (types)

• owl2-ql: Reasoning over large volumes of data

Rule Sets

Rule Sets and Reasoning Strategies

Questions?

• Welcome

• SPARQL Overview

• Loading Data

• Extensions#69

Ontotext GraphDB Connectors

• Provides extremely fast full text search, range, faceted search, and aggregations

• Utilize an external engine like Lucene, Solr or Elasticsearch

• Flexible schema mapping: index only what you need

• Real-time synchronization of data in GraphDB and the external engine

• Connector management via SPARQL

• Data querying & update via SPARQL

• Based on the GraphDB plug-in architecture

Workflow

Internal indexes Graph indexes

Solr/Elasticsearch direct

queries

Query Processor

Selective Replication

SPARQL INSERT/DELETE

SPARQL SELECT with or without an

embedded

Lucene/Solr/Elasticsearch query

Lucene/Solr/Elasticsearch GraphDB engine

Interface

• All interaction via SPARQL queries − INSERT for creating connectors − SELECT for getting connector configuration parameters− INSERT/SELECT/DELETE for managing & querying RDF data

Connectors – Primary Features• Maintaining an index that is always in sync with the data stored in

GraphDB

• Multiple independent instances per repository

• The entities for synchronization are defined by:− a list of fields (on the Lucene side) and property chains (on the GraphDB side) whose

values will be synchronised− a list of rdf:type's of the entities for synchronisation− a list of languages for synchronisation (the default is all languages)− additional filtering by property and value

• Full-text search using native Lucene queries

Connectors – Primary Features• Snippet extraction: highlighting of search terms in the search result

• Faceted search, e.g. Europeana Food and Drink

• Sorting by any preconfigured field

• Paging of results using offset and limit

• Custom mapping of RDF types to Lucene types

• Specifying which Lucene analyzer to use (the default is Lucene's StandardAnalyzer)

• Boosting an entity by the [numeric] value of one or more predicates

• Custom scoring expressions at query time to evaluate score based on Lucene #74

TinkerPop Blueprints Support

• Blueprints (Apache TinkerPop, aka Gremlin) is a popular API for accessing graph databases

• It is supported by Hadoop, Neo4j, Titan, etc

• GraphDB supports Blueprints since 7.0 for accessing RDF databases

• It represents RDF as a simplified version of the Property Graph model

• In this way you can use graph programming frameworks, or use ready graph exploration software like Linkurious

RDF Rank is a GraphDB™ extension that• Is similar to PageRank and it identifies “important” nodes in an RDF graph based on their

interconnectedness • Is accessed using the rank:hasRDFRank system predicate• Incremental RDF Rank is useful for frequently changing data

For Example, to select the top 100 important nodes in the RDF graph:

RDF Rank

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>SELECT ?n WHERE {?n rank:hasRDFRank ?r }ORDER BY DESC(?r)LIMIT 100

GeoSPARQL Support

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium, using the Geography Markup Language

• A small topological ontology in RDFS/OWL for representation

• Simple Features, RCC8, and DE-9IM (a.k.a. Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

• A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

Extensions

Questions?

Support and FAQ’s support@ontotext.com

Additional resources:

Ontotext:Community Forum and Evaluation Support: http://stackoverflow.com/questions/tagged/graphdb GraphDB Website and Documentation: http://graphdb.ontotext.comWhitepapers, Fundamentals: http://ontotext.com/knowledge-hub/fundamentals/

SPARQL, OWL, and RDF: RDF: http://www.w3.org/TR/rdf11-concepts/ RDFS: http://www.w3.org/TR/rdf-schema/ SPARQL Overview: http://www.w3.org/TR/sparql11-overview/ SPARQL Query: http://www.w3.org/TR/sparql11-query/ SPARQL Update: http://www.w3.org/TR/sparql11-update

For Further Information

• Peio Popov, North America Sales and Business Development−peio.popov@ontotext.com −1.929.239.0659

• Ilian Uzunov, Europe Sales and Business Development−Ilian.uzunov@ontotext.com −359.888.772.248

The EndGraphDB™ Fundamentals

Transforming your Graph Analytics with GraphDB (Vladimir Alexiev)

Software

GVP LOD: ONTOLOGIES AND SEMANTIC …...GVP LOD: ONTOLOGIES AND SEMANTIC REPRESENTATION Vladimir Alexiev, Data and Ontology Group, Ontotext Corp CIDOC Congress, Dresden, Germany 2014-09-05:

The GraphDB Landscape and sones - odbms.org GraphDB Landscape and sones NoSQL Frankfurt , 9/28/2010 Photo: ... JS Lua NoSQL Frankfurt , ... High performance and integration

NoSQL Frankfurt 2010 - The GraphDB Landscape and sones

Emil ALEXIEV Bulgaria

GraphDB: Modeling and Querying Graphs in Databases

GVP LOD: ONTOLOGIES AND SEMANTIC REPRESENTATION · 2020. 11. 10. · GVP LOD: ONTOLOGIES AND SEMANTIC REPRESENTATION Vladimir Alexiev, Data and Ontology Group, Ontotext Corp CIDOC

GraphDB Free Documentationgraphdb.ontotext.com/documentation/7.2/pdf/GraphDB-Free.pdfNote: The GraphDB documentation presumes that the reader is familiar with databases. The required

Semantic Technologies for Cultural Heritage Vladimir Alexiev, PhD, PMP vladimir.alexiev@ontotext.com Seoul, 19 May 20111KR Global Smart SOC Initiative

BBC's GraphDB (formerly Owlim) AWS Cloud Migration

20150624 Belgian GraphDB meetup at Ordina

GraphDB Free Documentationgraphdb.ontotext.com/documentation/6.6/pdf/GraphDB-Free.pdfGraphDB Free Documentation ... general 1

GVP LOD: ONTOLOGIES AND SEMANTIC REPRESENTATION - The Getty · GVP LOD: ONTOLOGIES AND SEMANTIC REPRESENTATION Vladimir Alexiev, Data and Ontology Group, Ontotext Corp CIDOC Congress,

Google App Engine Danail Alexiev Technical Trainer SoftAcad.bg

DTIc Alexander Alexiev

GraphDB Cloud: Enterprise Ready RDF Database on Demand

11:40 Develop: Morgner - Graphdb for CMS

NATO ARW, Velingrad, Bulgaria, 2006 1 Methods for Data and Information Fusion Kiril Alexiev, Iva Nikolova alexiev@bas.bg Tel: 9796620; 0898 898 616 25A,

GraphDB Fundamentals

ARTICLES OF ASSOCIATION · • Vladimir Ivanov Alexiev – BGN 637 533. 3 Module based on own implementation of ISO 8583. It can work either on its own or be embedded in other systems

qconlondon.com · GraphDb StreamProcessing 3nf Hibernate. Client Domain Bus OLAP GraphDb StreamProcessing 3nf Hibernate. Client Domain Bus OLAP GraphDb StreamProcessing 3nf Hibernate