22
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Efficient content structures and queries in CRX/CQ Marcel Reutegger | Senior Software Engineer 1

Efficient content structures and queries in CRX/CQ

  • Upload
    cqcon

  • View
    1.970

  • Download
    0

Embed Size (px)

DESCRIPTION

Presentation “Efficient content structures and queries in CRX/CQ“ by Marcel Reutegger at CQCON2013 in Basel on 19 and 20 June 2013.

Citation preview

Page 1: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Efficient content structures and queries in CRX/CQ Marcel Reutegger | Senior Software Engineer

1

Page 2: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Agenda

Repository storage basics

Efficient content structures

Query analysis and optimization

2

Page 3: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Repository storage basics

Nodes & properties stored in one entity -> bundle

Every node/bundle has a UUID (random)

Child nodes are linked from the parent node

Binaries go into the DataStore

3

Page 4: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Repository storage basics

Bundle structure

4

Bundle

UUID

Parent UUID

Properties

Child node

references

Name / Value

Name / Value

Name / Value

Name / UUID

Name / UUID

Name / UUID

Name / UUID

Name / UUID

Page 5: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Repository storage basics – TarPM

Nodes & Properties (bundles) stored in tar files

Tar files are append only

Data is never overwritten

Garbage is removed by TarPM optimization (scheduled, incremental)

5

Page 6: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Efficient content structures

Number of nodes

6

Page 7: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of nodes

Increasing number of nodes affects performance

Random UUIDs cause random I/O -> Jackrabbit design

15k rpm drive: 200-400 IOPS

Tar index file sizes (64 bytes per bundle)

1 million nodes: 70 MB

10 million nodes: 700 MB

100 million nodes: 7 GB

7

Page 8: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of nodes

How to reduce number of nodes

Use version purge tool

Remove archived workflow instances

Purge audit events

Application specific

Bad: document view ‘import’ of XML

Good: Pack properties on few nodes

Other benefits: DataStore GC will be faster

8

Page 9: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Efficient content structures

Number of child nodes

9

Page 10: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of child nodes

Frequently asked questions:

«What is the maximum supported number of child nodes?»

«I have X number of child nodes. Will performance be OK?»

10

Page 11: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of child nodes

Frequently asked questions:

«What is the maximum supported number of child nodes?»

«I have X number of child nodes. Will performance be OK?»

It depends!

11

Page 12: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of child nodes

Maximum number of child nodes

12

Bundle

UUID

Parent UUID

Properties

Child node

references

Name / Value

Name / Value

Name / Value

Name / UUID

Name / UUID

Name / UUID

Name / UUID

Name / UUID

Heap is

the limit

Page 13: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of child nodes

Adding a single child node

13

Page 14: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of child nodes

Large number of child nodes

OK for:

Static content

/libs/wcm/core/i18n/de has ~8k child nodes

Not OK for:

Dynamic content

E.g. user generated content

14

Page 15: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Number of child nodes - Recommendations

Structure content

E.g. date/time based: 2012/09/26

Use utilities like Jackrabbit BTreeManager

Keep number of child nodes within limits (e.g. 1000)

Save in batches when possible

15

Page 16: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query analysis & optimization

16

Page 17: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query analysis and optimization

Query debug log

http://dev.day.com/kb/home/Crx/Troubleshooting/HowToDebugJCRQueries.html

“executed in <time> ms. (<query>)”

JMX (CQ 5.5)

QueryStat: slow and most frequent queries

TimeSeries: count, duration, average

17

Page 18: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query analysis and optimization

Fast: simple comparison

sling:resourceType = ‘my/type’

Fast: node type match

//element(*, nt:hierarchyNode)

Fast: simple fulltext search

jcr:contains(@jcr:title, ‘crx’)

Fast: like on few distinct values

jcr:like(@jcr:mimeType, ‘%/plain’)

18

Page 19: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query analysis and optimization

Slow: jcr:contains with initial wildcard

jcr:contains(., ‘*rabbit’)

Alternative: don’t do it, unless you know exactly what you are doing!

Slow: jcr:like on many distinct values

jcr:like(@email, ‘%@gmail.com’)

Alternative: store data you want to query in separate property,

then you can write: @email-host = ‘gmail.com’

19

Page 20: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query analysis and optimization

Slow: ranges matching many distinct values

@jcr:lastModified > xs:dateTime(‘2001-09-17T18:17:13.000+02:00')

Alternative: reduce resolution (e.g. only store date and not time)

20

Page 21: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query analysis and optimization - Recommendations

Test with real content

Structure content to avoid queries

Denormalize

Avoid path constraints

21

Page 22: Efficient content structures and queries in CRX/CQ

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.