42
Introduction to ModeShape 3 November 28, 2012 Randall Hauch @rhauch

ModeShape 3 overview

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: ModeShape 3 overview

Introduction toModeShape 3

November 28, 2012

Randall Hauch@rhauch

Page 2: ModeShape 3 overview

Features

Current status & roadmap

Design (how we use Infinispan)

Best practices

Q & A

2

Page 3: ModeShape 3 overview

ModeShape 3An elastic in-memory hierarchical database with queries, transactions, events & more

3

Page 4: ModeShape 3 overview

Elastic• Add more processes to increase storage

capacity and/or throughput– No master, no slaves– Data is rebalanced as needed– Optionally separate database engine from storage

processes• Fault tolerant

– Processes can fail without loss of data– Cross-data center distribution (in near future)

4

Page 5: ModeShape 3 overview

Hierarchical• Organize the data into a tree structure that

reflects how the data is accessed & used– Navigation to related data– Still have references and queries

• Many scenarios have natural hierarchies

5

Page 6: ModeShape 3 overview

Strongly consistent• ACID

– Atomic, Consistent, Isolated, Durable– Already familiar to most developers– Easy to reason about code

• XA-aware– Participate in user transactions– Work with Java EE

6

Page 7: ModeShape 3 overview

Why not eventually-consistent?

• In eventually-consistent databases– changes made by one client will eventually (but not

immediately) be propagated to all processes– other clients won’t see latest data right away, yet can still

make other changes– there may be multiple versions of a particular piece of data

• Can be ideal for some scenarios– read-heavy and/or best-effort

• Applications that update data may need to– expect inconsistencies (and/or multiple versions)– specify conflict strategies – resolve conflicts (inconsistencies)

7

Page 8: ModeShape 3 overview

In-memory• Memory is really fast (and cheap)• Why not keep all data in memory?

– practical limits to memory on particular machines– memory isn’t shared between machines– data stored in memory isn’t durable– no queries, structure, or transactions

• ModeShape– distributes multiple copies of data across the

combined memory of many machines– can even persist data to disk or DB (if really needed)– can still use queries, structure and transactions– is fast

8

Page 9: ModeShape 3 overview

Queries• Find the data independently of the hierarchy• Use SQL-like language

9

SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006

SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() LIKE $path

SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() IN ( SELECT [vdb:originalFile] FROM [vdb:virtualDatabase] WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],'xml OR xml maybe'))

SELECT file.*,content.* FROM [nt:file] AS file JOIN [nt:resource] AS content ON ISCHILDNODE(content,file) WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'

Page 10: ModeShape 3 overview

With or without schema

• Choose how much schema is enforced– define patterns for values and structure– use different patterns for different parts of the database– change the patterns over time– use the “best” levels of schema validation– evolve as necessary

10

STRICT ENFORCEMENT

NO ENFORCEMENT

Page 11: ModeShape 3 overview

Binary storage

• Separate storage for BINARY values– content keyed by SHA-1– property value stored with node

contains SHA-1 and resolved as needed

– content always buffered• Option per repository

– File system– Transient (temp directory)– JDBC database– MongoDB– Infinispan (separate caches)– custom

11

Binary Storage

Page 12: ModeShape 3 overview

Sequencing• Automatically extract structured content

– save BINARY or STRING values– path rules & MIME types determine which sequencer is run– output stored in repository at configurable location

• Sequencers– CND– DDL– text (fixed width, delimited)– Microsoft Office™– Java (source & class)– ZIP (and JAR/WAR/EAR)– XML, XSD, and WSDL– Teiid VDBs– audio (MP3)– images– custom

12

1) upload

2) notify

3) derive and store

Sequencers

4) navigate or query

Page 13: ModeShape 3 overview

Federation (reintroduced in 3.1)

• Access data in external systems– external data projected as nodes

with properties and node types– supports read and optional write– same validation rules

• Connector options– File system (3.1)– Local git (3.2)– Database (3.2)– Database metadata (3.2)– Local repository (3.2)– External JCR repository (3.2)– custom

13

External source B

External source A

Page 14: ModeShape 3 overview

Monitoring• Measure statistics for a variety of metrics

– Total counts: active sessions, queries, workspaces, locks, listeners, events in queue, sequencing operations in queue

– Increment counts: events sent to listeners, nodes changed, saves, nodes sequenced

• Results– Metrics measured every 5 seconds– Results are aggregated into windows that show

statistics (min, max, median, variance, stdev, sample count) during last minute, hour, day, week, year

14

Page 15: ModeShape 3 overview

Public APIs

15

Page 16: ModeShape 3 overview

JCR 2.0• Standard Java API (JSR-283)

– javax.jcr packages– programmatically access, find, update, query content– commonly needed features: events, versioning, etc. – hierarchical tree of nodes, nodes have properties,

property values can reference other nodes

16

databases file systemsquery

integrity read

writelocking

streams

hierarchyaccess controltransactions

schema

versioningevents search

unstructured

content repositories

Page 17: ModeShape 3 overview

Extended JCR API• Extended JCR interfaces

– additional node type management methods– additional event types– additional Binary value methods (hash)– additional JCR-QOM language objects– cancel queries– sequencer and text SPIs– monitoring API

17

Page 18: ModeShape 3 overview

JDBC API• Use JDBC driver or data source

– connect to local or remote repository– issue JCR-SQL2 queries– access database metadata

• Enables existing applications to access content– ad hoc query tools– reporting systems

18

Page 19: ModeShape 3 overview

RESTful API• Access content over HTTP

– POST, PUT, GET, DELETE methods– JSON representations– Single or subtree of nodes with properties– Streams large BINARY values– Register node types– Execute queries

• Deployed as WAR file– Same app server in which ModeShape is deployed– Handles multiple repositories

19

Page 20: ModeShape 3 overview

WebDAV API• Exposes content as files and directories

– nt:file nodes exposed as files– nt:folder nodes exposed as directories– other nodes exposed as directories

• Mount repository on file system– Treat as external drive– Upload files and folders into repository

• Deployed as WAR file– Same app server in which ModeShape is deployed– Handles multiple repositories

20

Page 21: ModeShape 3 overview

Deployment options

21

Page 22: ModeShape 3 overview

ModeShape 3 and Infinispan

22

Single process

...

...

ModeShape

Infinispan cache(local)

Persistent Store

data

Page 23: ModeShape 3 overview

ModeShape 3 and Infinispan

23

Small cluster

...

...

ModeShape

Infinispan cache(replicated)

...

...

ModeShape

Infinispan cache(replicated)

...

...

ModeShape

Infinispan cache(replicated)

Persistent Store

data

events

data

events

datadatadata

Page 24: ModeShape 3 overview

ModeShape 3 and Infinispan

24

Moderate single-­ or multi-­site cluster

...

...

ModeShape

Infinispan (distributed)

...

...

ModeShape

Infinispan (distributed)

data

events...

...

ModeShape

Infinispan (distributed)

data

events ...

...

ModeShape

Infinispan (distributed)

data

events

...

Page 25: ModeShape 3 overview

ModeShape 3 and Infinispan

25

Large single-­ or multi-­site cluster

...

...

ModeShape

...

...

ModeShape

events...

...

ModeShape

events ...

...

ModeShape

events

...

Infinispan data grid

datadata data data

Page 26: ModeShape 3 overview

ModeShape AS7 kit

26

Page 27: ModeShape 3 overview

Deploying ModeShape in AS7• Simple installation

– simply unzip into existing AS7 installation– includes “standalone-modeshape.xml” that contains a a

variety of ready-to-run sample repositories

• ModeShape subsystem for AS7 – use AS7 tools to define 1+ repositories– each repository is independently configured– update repository configuration while running– (re)uses Infinispan and JGroups subsystems– clustering is built-in– perform management and monitoring operations

27

Page 28: ModeShape 3 overview

Sample AS7 configuration

28

<subsystem xmlns="urn:jboss:domain:modeshape:3.0"> <repository name="sample" /> </subsystem>

– Each “repository” fragment defines a repository– Multiple are supported

Page 29: ModeShape 3 overview

Sample AS7 configuration(more thorough)

29

<subsystem xmlns="urn:jboss:domain:modeshape:3.0"> <!-- Multiple 'repository' elements are allowed --> <repository name="sample" cache-name="sample" cache-container="modeshape" jndi-name="jcr/local/sample" enable-monitoring="true" default-workspace="default" allow-workspace-creation="true" security-domain="modeshape-security" anonymous-roles="readonly,readwrite,admin" anonymous-username="<anonymous>" use-anonymous-upon-failed-authentication="false"> <workspaces> <!-- 0 or more workspaces can be predefined. At the moment, these are just names. But we may want to specify content or something else, so create element for each. --> <workspace name="predefinedWorkspace1" /> <workspace name="predefinedWorkspace2" /> <workspace name="predefinedWorkspace3" /> </workspaces> <indexing thread-pool="modeshape-workers" batch-size="-1" reader-strategy="shared" mode="sync" async-thread-pool-size="1" async-max-queue-size="1" > <analyzer classname="org.apache.lucene.analysis.standard.StandardAnalyzer" module="" /> <jms-master-backend connection-factory-jndi-name="" queue-jndi-name=""/> </indexing> <file-master-index-storage rebuild-upon-startup="ifMissing" format="LUCENE_CURRENT" path="modeshape/sample/indexes" relative-to="jboss.server.data.dir" access-type="auto" locking-strategy="native" source-path="/var/lib/modeshape/index/" refresh-in-seconds="3600"/> <file-binary-storage min-value-size="4096" path="modeshape/sample/binaries" relative-to="jboss.server.data.dir" /> <sequencers> <!-- 0 or more sequencers --> <sequencer name="Java Source" classname="java" path-expression="/files/(*.java)[/jcr:content] => /java/$1"/> </sequencers> </repository> </subsystem>

Page 30: ModeShape 3 overview

30

WebDAV

HTTP/REST

HTTP/REST

Ruby PHP

JavaJavaScriptPython

Repositories

Web Apps, EJBs, MDBs, etc

JBoss AS 7

JCR

ModeShape repositories in JBoss AS7

JDBC

JCR

JDBC

JCR

JDBC

Page 31: ModeShape 3 overview

Current Status & Roadmap

31

Page 32: ModeShape 3 overview

32

ModeShape releases

Development shifted to 3.x in October 2011

Page 33: ModeShape 3 overview

ModeShape 3 (part 1 of 2)

• Much faster– Order of magnitude faster, or more– Way higher write concurrency (equivalent to node-level locking)– Thread-safe implementations– Memory is the new disk– Internal caches and lazy loading– Faster resolution of references and back-references

• Massively larger repository sizes– Millions of nodes, or more– Flat hierarchies (>>10K children under 1 parent)– Very large files, without consuming heap

• More deployment options– Large clusters– High availability – Multiple sites– Cloud33

Page 34: ModeShape 3 overview

ModeShape 3 (part 2 of 2)

• Easily embedded– Lightweight, multi-repository engine– Hot deployment and configuration of repositories– Windowed metrics

• JBoss AS 7 integration– Provides lightweight, on-demand JCR subsystem– Hot deployment and configuration of repositories– Management of domains (clusters and groups)– Monitoring and alerting (via RHQ/JON)

• Participate in JTA transactions– Enabling easy use of JCR in EJB, MDB, CDI, etc.

• Simpler SPIs– Sequencers, text extractors, security providers, binary stores, and

connectors

34

Page 35: ModeShape 3 overview

Under the ModeShape 3 covers• Use best-of-breed technology

– Infinispan: cache, key-value store and data grid– Hibernate Search: indexing– JGroups: clustering events– JBoss AS7: small, fast, clusterable, manageable, cloud– Others: RESTEasy, PicketLink, etc.

• Design techniques– Simplify, simplify, simplify! – Use immutability first, otherwise write concurrent code– Cache data (especially immutable)– Share more data between sessions– Plan for eventual consistency – Remove layers– Use sequences (lazily load data, benefits large collections)– JSON/BSON documents optimized for in-memory usage

35

Page 36: ModeShape 3 overview

Design(How ModeShape uses Infinispan)

36

Page 37: ModeShape 3 overview

ModeShape 3

37

Basic architecture

JCR layer

Storage layer

...

...

ModeShape Repository

Content StorageBinaryStorage

ExternalSystems

Page 38: ModeShape 3 overview

ModeShape 3 and Infinispan

38

Using different caches for different purposes

JCR sessions hold their changes in memory;; will use Infinispan caches that (can) overflow to disk

Each node state stored in Infinispan cache as 1 or moreJSON/BSON documents

...

...

Shared, transient Infinispan caches for each workspace, caching node representations and expiring entries based on events

ModeShape Repository

Content Storage(Infinispan)

Configure Infinispan store as needed

BinaryStorage

ExternalSystems

(Infinispan) (Infinispan)(Infinispan)

(Infinispan)

Page 39: ModeShape 3 overview

Best Practices

39

Page 40: ModeShape 3 overview

Best practices (1 of 2)• Build structure first, then node types

– most important to get your node structure right– it will change over time anyway, so don’t define the node types too soon

• Use mixin node types and mixins– where possible define sets of properties as mixins– use in primary types and dynamically add to nodes

• Limit use of same-name-siblings– useful when required, but can be expensive and difficult to use (i.e., paths change)

• Prefer hierarchies– moderate numbers of child nodes, use multiple levels if necessary

• Store files and folders with ‘nt:file’ and ‘nt:folder’– use it wherever appropriate; not for all binary data, though!

• Verify features are enabled– improves portability and safety with configuration changes

• Import and export– avoid document view; use system view wherever possible

40

Page 41: ModeShape 3 overview

Best practices (2 of 2)• Prefer JCR-SQL2 and JCR-QOM over other query languages

– by far the richest and most useful– do this even when it appears the queries are more complicated

• Only Repository is thread-safe; no other APIs are– don’t share sessions– don’t share anything between sessions

• Register all listeners in special long-lived sessions– do nothing else with these sessions, however (Session is not threadsafe)– get off the notification thread ASAP, using work queues where necessary– Session is not threadsafe

• Create new sessions rather than reusing a pool of sessions– Sessions are intended to be lightweight as possible– Create a session, use it, log out (even web applications and services!)

• Avoid deprecated APIs– either perform poorly or are a bad idea; besides, they’ll be removed eventually

• Use Session.save() not Node.save()

41

Page 42: ModeShape 3 overview

Questions?

42