19
#evolverocks CRX2OAK – ALL THE SECRETS OF REPOSITORY MIGRATION TOMEK RĘKAWEK, ADOBE RESEARCH Aug 30, 2016

CRX2Oak - all the secrets of repository migration

Embed Size (px)

Citation preview

#evolverocks

CRX2OAK – ALL THE SECRETS OF REPOSITORY MIGRAT ION

TOMEK RĘKAWEK, ADOBE RESEARCH

Aug 30, 2016

#evolverocks 2

• Overview of CRX2Oak• CRX2Oak command line• Features• Case study: large migration• General migration tips• Using CRX2Oak for AEM upgrade• Q & (hopefully) A

AGENDA

#evolverocks 3

OVERVIEW OF THE CRX2OAKUPGRADE FROM CRX2

CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak

#evolverocks 4

OVERVIEW OF THE CRX2OAKUPGRADE OR SIDEGRADE

CQ 5.x – CRX2AEM 6.x – Jackrabbit Oak

AEM 6.x – Oak

#evolverocks 5

OVERVIEW OF THE CRX2OAKMIGRATING BINARIES

#evolverocks 6

• CRX2Oak is a command-line tool:• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET

• Source and target defines the repositories. Supported formats:• path to the CRX2 “repository” directory, eg.

crx-quickstart/repository• path to the Oak SegmentMK “repository” directory, as above• Mongo URI, eg.

mongodb://localhost:27017/aem• JDBC URI, eg.

jdbc:mysql://localhost:3306/sakila?profileSQL=true

CRX2OAK COMMAND L INEREPOSITORY PARAMETER TYPES

#evolverocks 7

• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET• The source blob store is defined using: --src-datastore or --src-s3datastore.• If there’s no blob store defined for source, CRX2Oak assumes embedded• If the source blob store is defined, it will be used for target as well (only

references will be copied, not actual binaries)• It can be overridden with --copy-binaries• Destination blob store can be defined with: --datastore or --s3datastore

CRX2OAK COMMAND L INEDEFINING DATASTORE TO BE USED

#evolverocks 8

FEATURESSELECTING PATHS TO MIGRATE

#evolverocks 9

FEATURESMIGRATING VERSION STORAGE

#evolverocks 10

• Client requirements• CQ 5.6.1 instance with a large number of sites and assets, storing binaries in S3• The content is being authored 24/7• The migration of the whole content takes about 20h• The migration is being done offline and the instance can’t be down so long• The upgraded instance has to be tested before going live

• Strategy• Snapshot the instance and migrate the copy• Perform tests on it• Top-up the changes introduced after snapshot

CASE STUDYINTRODUCTION

#evolverocks 11

CASE STUDYSTRATEGY

#evolverocks 12

• The migration (4) will be much faster, as only the diff will be migrated• In the (4) use --skip-init, so the existing repository won’t be reinitialized• Also, use --include-paths=/content/mysite to migrate only the modified

subtree

CASE STUDYREMARKS

#evolverocks 13

• When using Mongo (either as source or destination), run CRX2Oak on the same machine as Mongo primary

• If you don’t need version history for deleted nodes, use --copy-orphaned-versions=false to make the migration faster

• CRX2Oak may be used to copy content between existing repositories. Use following parameters:• --skip-init, so the destination is not initialized with the index definitions,• --{include,merge}-paths to refer which subtrees should be copied• --copy-orphaned-versions=false

GENERAL MIGRAT ION T IPS

#evolverocks 14

• When upgrading CQ 5.x + S3, crx2oak calls AWS asking for length of each binary• the lengths are stored in Oak but not in CRX2, so we have to ask about it

• For a large repositories it may slow down the whole migration• It’s possible to pre-fetch all lengths, store them in a text file and configure CRX

(and therefore CRX2Oak) to use it• More information:

• https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/upgrade/blob/LengthCachingDataStore.html

• Sample configuration files:• http://bit.ly/cq5-s3-upgrade

GENERAL MIGRAT ION T IPSUPGRADING CQ 5.X STORING BINARIES IN AWS S3

#evolverocks 15

• UUID conflict exception• may occur if the destination repository already exists (iterative migration)• remember to add --copy-orphaned-versions=false• when using --include-paths, include all modified paths:

• otherwise, if the page has been moved and we include only the destination path, CRX2Oak won’t remove the page from its original position

• BlobId not found exception• either source or destination blob store is not configured correctly

• Unable to delete referenced node• probably CRX2Oak tries to overwrite the whole version storage (removing existing

versions)• add --copy-orphaned-versions=false

TROUBLESHOOTING

#evolverocks 16

Official docs describes using the extension:• java -jar aem-quickstart-6.2.0.jar -unpack # unpack the AEM jar• java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare extension

config• java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare OSGi config• java -Xmx4096m -XX:MaxPermSize=2048M -jar aem-quickstart-6.2.0.jar -v -

x crx2oak -xargs -- -o migrateFor running the CRX2Oak manually, the last command should be replaced with:• java -Xmx4096m -XX:MaxPermSize=2048M -jar crx-

quickstart/opt/helpers/crx2oak/crx2oak.jar [source] [destination]

US ING EXTENS ION VS RUNNING CRX2OAK MANUALLY

#evolverocks 17

• All CRX2Oak versions offer similar features• They differ in:

• Oak version used underneath (as the CRX2Oak starts a normal Oak repository)• Index definitions created during the repository initialisation

• These both things are assigned to the AEM version and shouldn’t be mismatched• Table of truth:

• CRX2Oak 1.2.x can be used with AEM 6.1 too, but it won’t have all the advanced features

VERS IONS

AEM Oak CRX2OakAEM 6.0 1.0.x 1.0.xAEM 6.1 1.2.x 1.3.x (sic!)AEM 6.2 1.4.x 1.4.x

#evolverocks 18

• CRX2Oak downloads:• https://repo.adobe.com/nexus/content/groups/public/com/adobe/granite/crx2oak/

• CRX2Oak documentation• https://docs.adobe.com/docs/en/aem/6-2/deploy/upgrade/using-crx2oak.html

• oak-upgrade documentation:• https://jackrabbit.apache.org/oak/docs/migration.html

RESOURCES

#evolverocks

THANK YOU!

http://tomek.rekawek.eu@[email protected]