Oracle Coherence Hands on Lab

S309137

Migrating Applications from Local to Distributed Caching with Oracle Coherence

Contents Introduction: Oracle Coherence Hands-On Lab ........................................................................................ 3

Base Environment ................................................................................................................................... 3

Quick overview of the application............................................................................................................ 4

Running the application with local cache ................................................................................................. 5

Explore the application. Understand how it works. ................................................................................ 11

Update application to run with distributed cache .................................................................................. 13

Configuration Subsystem ................................................................................................................... 15

Cache Store ....................................................................................................................................... 16

Distributed Cache Worker implementation ........................................................................................ 16

Run application with distributed cache .................................................................................................. 18

Test failover and redundancy ................................................................................................................ 22

Shutdown everything ............................................................................................................................ 22

Introduction: Oracle Coherence Hands-On Lab

In this hands-on lab, you will take an application that loads data from a set of large files, caches the data

in memory, and performs complex queries and updates against the cached data, with changes persisted

to the file. You will then modify this application to cache the data in a coherence data grid and perform

these computations against it, learning the common Coherence API's, configuration files and usage

patterns as you go along. By doing this, you will become familiar with some core coherence concepts

like the configuration sub-system, the read-through and write-through mechanisms for loading and

persisting data to the backend data source, the JMX subsystem, and the use of EntryProcessors for lock-

free concurrent and well performing updates.

For this hands-on lab, we have built a sample application that simulates the access and update of a set

of dictionaries of some outdated (and frankly made up) languages. In the next section, you will get an

overview of the application. The initial version of the application will work against the dictionaries

loaded up in your application's local JVM. We will then walk you through updating the application to

work with the data loaded in a Coherence data grid.

This hands-on lab is focused on Coherence. There is no dependency on a database or an application

server. The knowledge you gain here is directly applicable in use cases where Coherence is used within

an Application Server or fronting a database.

Base Environment The lab machine provides you with the following environment that you will be using in this lab.

Oracle Coherence 3.5 for Java

Eclipse 3.5 Galileo

JRockit 1.6.0

On the desktop, you will find a folder named S309137 Migrating Applications from Local to Distributed

Caching with Oracle Coherence. This folder contains a shortcut to the lab which you will be working

with. The actual lab folder (accessed from the shortcut) contains scripts to assist you with the lab. There

is also a solutions directory in there, which contains the solutions for this lab (for reference when stuck,

etc).

Quick overview of the application

This sample application allows you to access the following information about words in a set of

languages.

description

synonyms

antonyms

daily frequency of use of each word

the year which the word became obsolete

The words and their metadata (as described above) are stored in a number of zip files in the working

directory. For each language LANG, a zip file called lang-LANG.zip exists and contains an entry for each

word.

The application exposes a command line interface with very simple commands to interact with the

dictionaries.

The section below shows some typical usage.

Running the application with local cache

For your convenience, batch scripts are provided which allow you setup your environment and run the

application. A hint is provided below where double-clicking on the script allows you bypass typing into

the cmd shell.

Every command must be run with the environment properly setup. You will need to do this for every

cmd shell window you open.

set JAVA_HOME=c:\jrmc_3.1

set COHERENCE_HOME=c:\coherence

set CLASSPATH=.;%COHERENCE_HOME%\lib\coherence.jar;%COHERENCE_HOME%\lib\tangosol.jar

set PATH=.;%JAVA_HOME%\bin;%PATH%;

To start, compile the application.

javac -d . app\*.java

Hint: you can just run the app-compile.bat script.

Now generate the dictionaries as zip files containing the words for different languages.This will take

about 5 minutes.

Java -Xms1g Xmx1g app.CreateDictionary

Hint: you can just run the app-create-dict.bat script.

Now run the application using the local cache. Here, the cached data is stored in some Collection objects

on the local java heap.

java -Xms1g -Xmx1500m -Xmanagement app.Main -local

Hint: You can just run the app-local.bat script.

This will open up a prompt as below:

Main>

At the prompt, type help to see the different commands and how to use them

Main> help

Let us get familiar with the application from a user's point of view. At startup, the dictionary has not

been loaded up into memory.

Let us look up information about the word word100003 from the language lang1. Since the dictionary is

not loaded up into memory, the application will retrieve the information from the zip files directly, and

cache just that result into memory.

Main> show lang1 word100003

Now, update the description associated with the word word100003 from the language lang1

Main> update -d lang1 word100003 this new description is cool

Run the show command again to ensure that the cache was updated. Also, check the zip file (lang-

lang1.zip) and look at the entry for word100410, and ensure that the change was written transparently

to the backend zip file. You will notice that updates are slow. Unfortunately, java has poor support for

updating zip files, so we had to completely recreate the zip file for each changed word (this is while

update takes a while).

Load up the whole dictionary for all languages into memory so that we can do some more complex

queries. This may take up to 30 seconds to load all 10 dictionaries.

Main> load

Now, perform a query. Search for words which exist in the language lang1, that are synonymous with

synonym674

Main> find lang1 -s synonym674

Let us do a more complex query. Search for words which exist in the language lang1, that are

synonymous with synonym674 and synonym1093, but opposite (antonym) of antonym948

Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243

For both of these, you got an UnsupportedOperationException because the query support is hard to

implement in our local cache. A database and SQL would have made life much easier but we do not have

access to that for our sample application.

Hint: This is possible and easy using Coherence.

Let us try some more commands.

Look up the languages currently supported in our dictionary

Main> langs

Now look up the stats of the currently loaded cache. This should show you the number of words

currently loaded into the cache for each language.

Main> stats

Explore the application. Understand how it works.

Now, we will go ahead and explore the application to see how it works under the hood.

The application is run from the working directory. For each language LANG, there is a zip file in that

directory called lang-LANG.zip which contains meta-data for each word in the language. Open up lang-

lang1.zip and look at the contents for familiarity.

This working directory is already setup as an Eclipse project. Open up eclipse and import the project into

your workspace. From the file menu, select Import. From the dialog box that comes up, open the

General node in the tree menu and select Existing Project into Workspace. Click Next. Click the

radio button beside Select Root Directory and click the Browse button to select the working

directory. Select the project oow_hol_coherence and click Finish.

Hint: You can just run the eclipse.bat script. The working directory is setup as a pre-configured eclipse

workspace and project.

You should have the project as below:

The java source files all reside under the app directory. Please read through them to get familiar with

what they do.

Record.java This encapsulates a word in a language

Main.java This is the main command line interface that parses the user inputs and calls the

appropriate commands

Worker.java This interface allows us decouple the implementation of the cache from the

application

LocalWorker.java This implementation of Worker uses a local hash map to store the data, and (tries

to) manages the computation itself

Helper.java This contains some shared helper functions

Metrics.java This is a table model containing the languages and memory used for each language

within the cache.

Monitor.java This is a swing UI which regularly gets updated metrics from the cache and displays

as a Swing table.

Once you have a good understanding of the application, feel free to go back to the previous section to

run the lab again and get familiar with the application. The rest of the lab will go a lot smoother once

you get a hang of what the application is trying to accomplish.

Update application to run with distributed cache

Now, the lab gets more interesting. We will walk through a number of steps to update the application so

it runs against the Coherence Data Grid.

In this lab, we aim to achieve the following using Coherence

Store the cached data in multiple external JVMs which look to us as a giant local heap with quick

access

Transparently read the records from the cache, even if they have not been loaded into

Coherence

Transparently let the cache persist updates to the backend data source

Perform queries and computation on the coherence cache

Configure the coherence cluster so that other users on the network do not conflict with us

Efficiently make updates with the minimum amount of network hops

Monitor the application (including the distributed cache) transparently from a single location

To achieve this, we will understand and leverage the following concepts from Coherence

Invocable Maps and Entry Processors

JMX functionality

Read-Through and Write-Through caching strategies

Configuration subsystem

Coherence Services

Partitioned (Distributed) and Near cache topologies

First, define an architecture strategy. In general, Coherence supports using a Partitioned cache where a

number of backups are stored on one or more members in the data grid for failover. In addition, a near

cache can wrap a partitioned cache, so that a copy of the data is stored on the local JVM for each (so

network access may be bypassed). We will leverage both mechanisms for our caching.

The actual words and their metadata are stored in the partitioned (distributed) cache. There will be a

different named cache used for each language. For example, language lang1 will be stored in the cache

called dict-lang1. The list of available languages and some other state we need will be stored in the near

cache (called app-shared). This near cache will actually wrap a different partitioned cache.

We will also use a CacheStore so that read requests to the cache will transparently go to the backend

when a cache miss happens, and updates will also transparently go to the backend zip files when we

want them to.

Configuration Subsystem

Coherence is an extensively configurable system.

At startup, Coherence will find the file tangosol-coherence-override.xml on your classpath. This

configuration is for system-wide settings. In this file, you can configure things like your cluster address,

cluster multicast port, cluster name, etc.

Copy tangosol-coherence-override.xml from the solutions directory into your working directory. Use the

configuration for a local machine restricted cluster, and also define a different distributed cache service

for storing the shared state (e.g. languages).

Once Coherence gets a request to lookup a Cache, it will load up the coherence-cache-config.xml file.

You should configure your caches here, setting up those that will use the distributed (partitioned) cache

separately from those that should use the replicated cache. Copy coherence-cache-config.xml from the

solutions directory into your working directory, and ensure the following:

Two separate cache schemes are defined for dictionary (using distributed scheme) and shared

caches (using near scheme) respectively.

A naming convention is used where caches with names matching dict-* are mapped to the

distributed scheme, while the cache app-shared is mapped to the near scheme.

A cache store is configured which takes the cache name as a parameter. The cache store should

apply to every dict-* cache alone, since these are the only ones which need to load up or persist

data to the backend zip files.

Now that we have gotten the configuration out of the way, let us go about working on the actual

application.

Cache Store

First, create the CacheStore which does the work of reading data during a cache-miss or persisting data

after a cache update. We want an explicit update to the cache to write through to the zip files, but not a

bulk load. This means that every cache put should not write back to the zip files. We can control this by

using a variable stored in the cache itself. The shared cache (app-shared) will be used to store a variable.

During a bulk load, we will set the variable in the cache, and remove it once the bulk load is done. The

cache store will not do a write-through to the backend if the variable is stored. Any other cache puts will

result in a write-through to the backend. Look at the way app\LocalWorker.java reads and writes

individual records to/from the zip files. Implement that same logic in app\DistCacheStore.java.

Hint: Look at the solutions directory for how this is done.

Distributed Cache Worker implementation

Next, create the actual implementation of the Worker interface, similar to the app\LocalWorker.java

implementation, that can interact with Coherence. Call this app\DistWorker.java. In this

implementation, we need to store all variables in the coherence cache, so that anyone in the coherence

cluster can access this. This includes:

languages

dictionaries (words)

Implement the following methods, with guidelines below:

getLanguages store the list of languages in the repl-shared cache and retrieve from there as needed

getRecord Simply retrieve the word from the cache. Coherence will ensure that it checks the

backend zip file if it does not have it, since the cache store has been configured.

update One way to implement this will be to retrieve the Record from Coherence to your local

JVM, make updates on the local JVM, and then send the updated Record over to

Coherence. However, this can cause unnecessary network traffic which could be a

bottleneck if the size of the records is large (especially compared to the change you

want to make). A more efficient way is to send your updates directly to the coherence

node that hosts the data. You achieve this using EntryProcessors and the

InvocableMap.

In addition, this method must be smart enough to write-through to the backend zip

files only when requested. This can be done by setting a variable in the cache before

putting the record in, and the CacheStore will only persist to the zip files if that flag is

set.

bulkInsert For large uploads, it is more efficient to upload to Coherence in batches. Coherence has

a putAll API which can be used for this.

find Unlike the app.LocalWorker (where implementing complex queries without a database

is difficult and un-implemented), Coherence has an extremely powerful query

functionality which can easily simulate complex SQL queries. This is the Coherence

Filters API. An added advantage is that Coherence will run your query in parallel across

all the nodes in the cluster and return your results faster. The more Coherence nodes

you have, the less data each one holds and the less work it does, meaning that your

query performance scales linearly with the number of Coherence nodes.

updateMetrics Coherence has an extensive JMX feature set, where all the management information

can be federated into any number of nodes in the coherence cluster that you deem

should hold federated management information for the cluster. We will leverage this

to keep track of the number of entries in the cache, how these entries are distributed

in the coherence data grid, and how memory is used across the grid.

clear This will remove all entries stored in the cache for a given language

Look in the solutions directory for the full solution for the DistWorker.java implementation.

Finally, update the command line interface app\Main.java to have the -dist command line parameter

switch to using the app.DistWorker implementation. Do this by un-commenting the call that

instantiates DistWorker below.

public static void main(String[] args) throws Exception {

Main m = new Main();

m.worker = new LocalWorker();

for(int i = 0; i < args.length; i++) {

if(args[i].equals("-dist")) {

//m.worker = new DistWorker();

}

}

m.run();

}

Run application with distributed cache

Compile your application as was done in one of the prior sections, by opening up a cmd shell, setting up

your environment and running javac.

Hint: You can run the app-compile.bat script

Now, start up three coherence cache servers, using the command line:

java -Xms256m -Xmx512m -Dtangosol.coherence.management.remote=true

com.tangosol.net.DefaultCacheServer

Hint: You can just run the coherence-cache-server.bat script. Double-click three times to start 3 servers.

This will start up a Coherence cache server configured to expose its management (JMX) information

around the cluster.

Now run the application using the distributed cache implementation, using the command line:

java -Xms256m -Xmx512m -Xmanagement -Dtangosol.coherence.management=all -

Dtangosol.coherence.management.remote=true -Dtangosol.coherence.distributed.localstorage=false

app.Main -dist

Hint: You can just run the app-dist.bat script

This will start the command line interface to your application, setting that JVM as a coherence cluster

member which does not store data, but which collects management information from all the coherence

cluster members and stores it in its JMX MBeanServer. You will thus locally be able to see all the

management information from the whole coherence cluster within your local JVM.

Note that once the data has been loaded into Coherence, you can run multiple clients and have them all

share the distributed in-memory data grid.

At the prompt, run stats which will pop up a Swing UI which updates itself as the Coherence cluster

membership and contents change. From this Swing UI table, you can see the membership of the

coherence cluster and how much of the data each member holds in near-real time (the Swing UI

updates itself every 5 seconds).

Main> stats

As mentioned above, this swing table updates itself every five seconds, with the JMX information which

has been federated at your local JVM. Each row in it represents how the amount of data stored by each

coherence server on behalf of the cluster, and the memory being used in MB. For example, in the

screenshot above, the coherence JVM with node-id 1 is the primary store for 10089 words of lang2,

10075 words for lang1, 9924 words for lang3, and uses 114MB of memory. The next rows show how

much coherence JVMs with node id 2 and 4 hold.

As you add or shutdown coherence cache server, and even as you bulk-load the records into coherence,

monitor this Swing UI and see how Coherence automatically distributes the cached data. Look at

app\DistMonitor.java for the full implementation.

Also, you can open up JConsole or JRMC to look at the rich set of JMX management metrics exposed by

Coherence.

Hint: You can run the jrockit-mc.bat script

Once that is done, run the other commands as was shown in the previous section. A sampling of those

commands is below.

Main> help

Main> langs

Main> show lang1 word100003

Main> update -d lang1 word100003 this new description is cool

Main> load

Main> find lang1 -s synonym674

Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243

Main> exit

To show the beauty of this distributed solution, run another instance of the client. At the prompt, just

run the find command. You will see that it works. Any new client does not have to load up the data,

since all the data and state is stored externally in the distributed cache.

Hint: You can run the app-dist.bat script

Test failover and redundancy

You can test failover and redundancy of your Coherence implementation by shutting down some

instances and bringing others back up. You can do this by typing Ctrl-C in some of the Coherence Cache

Server windows, and/or bringing some new Coherence servers back up.

Hint: You can use the coherence-cache-server.bat script.

Watch the windows of the coherence cache servers that are live. Notice how the coherence cluster

automatically rebalances the data. Watch the Swing UI to see the updated stats on how the data is

rebalanced and memory used in the cache servers.

Shutdown everything

To shutdown everything, type Ctrl-C in all your open windows.

Documents

Oracle Coherence Hands on Lab