22
S309137 Migrating Applications from Local to Distributed Caching with Oracle Coherence

Oracle Coherence Hands on Lab

Embed Size (px)

DESCRIPTION

Oracle Coherence Hands on Lab

Citation preview

  • S309137

    Migrating Applications from Local to Distributed Caching with Oracle Coherence

  • Contents Introduction: Oracle Coherence Hands-On Lab ........................................................................................ 3

    Base Environment ................................................................................................................................... 3

    Quick overview of the application............................................................................................................ 4

    Running the application with local cache ................................................................................................. 5

    Explore the application. Understand how it works. ................................................................................ 11

    Update application to run with distributed cache .................................................................................. 13

    Configuration Subsystem ................................................................................................................... 15

    Cache Store ....................................................................................................................................... 16

    Distributed Cache Worker implementation ........................................................................................ 16

    Run application with distributed cache .................................................................................................. 18

    Test failover and redundancy ................................................................................................................ 22

    Shutdown everything ............................................................................................................................ 22

  • Introduction: Oracle Coherence Hands-On Lab

    In this hands-on lab, you will take an application that loads data from a set of large files, caches the data

    in memory, and performs complex queries and updates against the cached data, with changes persisted

    to the file. You will then modify this application to cache the data in a coherence data grid and perform

    these computations against it, learning the common Coherence API's, configuration files and usage

    patterns as you go along. By doing this, you will become familiar with some core coherence concepts

    like the configuration sub-system, the read-through and write-through mechanisms for loading and

    persisting data to the backend data source, the JMX subsystem, and the use of EntryProcessors for lock-

    free concurrent and well performing updates.

    For this hands-on lab, we have built a sample application that simulates the access and update of a set

    of dictionaries of some outdated (and frankly made up) languages. In the next section, you will get an

    overview of the application. The initial version of the application will work against the dictionaries

    loaded up in your application's local JVM. We will then walk you through updating the application to

    work with the data loaded in a Coherence data grid.

    This hands-on lab is focused on Coherence. There is no dependency on a database or an application

    server. The knowledge you gain here is directly applicable in use cases where Coherence is used within

    an Application Server or fronting a database.

    Base Environment The lab machine provides you with the following environment that you will be using in this lab.

    Oracle Coherence 3.5 for Java

    Eclipse 3.5 Galileo

    JRockit 1.6.0

    On the desktop, you will find a folder named S309137 Migrating Applications from Local to Distributed

    Caching with Oracle Coherence. This folder contains a shortcut to the lab which you will be working

    with. The actual lab folder (accessed from the shortcut) contains scripts to assist you with the lab. There

    is also a solutions directory in there, which contains the solutions for this lab (for reference when stuck,

    etc).

  • Quick overview of the application

    This sample application allows you to access the following information about words in a set of

    languages.

    description

    synonyms

    antonyms

    daily frequency of use of each word

    the year which the word became obsolete

    The words and their metadata (as described above) are stored in a number of zip files in the working

    directory. For each language LANG, a zip file called lang-LANG.zip exists and contains an entry for each

    word.

    The application exposes a command line interface with very simple commands to interact with the

    dictionaries.

    The section below shows some typical usage.

  • Running the application with local cache

    For your convenience, batch scripts are provided which allow you setup your environment and run the

    application. A hint is provided below where double-clicking on the script allows you bypass typing into

    the cmd shell.

    Every command must be run with the environment properly setup. You will need to do this for every

    cmd shell window you open.

    set JAVA_HOME=c:\jrmc_3.1

    set COHERENCE_HOME=c:\coherence

    set CLASSPATH=.;%COHERENCE_HOME%\lib\coherence.jar;%COHERENCE_HOME%\lib\tangosol.jar

    set PATH=.;%JAVA_HOME%\bin;%PATH%;

    To start, compile the application.

    javac -d . app\*.java

    Hint: you can just run the app-compile.bat script.

    Now generate the dictionaries as zip files containing the words for different languages.This will take

    about 5 minutes.

    Java -Xms1g Xmx1g app.CreateDictionary

    Hint: you can just run the app-create-dict.bat script.

  • Now run the application using the local cache. Here, the cached data is stored in some Collection objects

    on the local java heap.

    java -Xms1g -Xmx1500m -Xmanagement app.Main -local

    Hint: You can just run the app-local.bat script.

    This will open up a prompt as below:

    Main>

    At the prompt, type help to see the different commands and how to use them

    Main> help

  • Let us get familiar with the application from a user's point of view. At startup, the dictionary has not

    been loaded up into memory.

    Let us look up information about the word word100003 from the language lang1. Since the dictionary is

    not loaded up into memory, the application will retrieve the information from the zip files directly, and

    cache just that result into memory.

    Main> show lang1 word100003

  • Now, update the description associated with the word word100003 from the language lang1

    Main> update -d lang1 word100003 this new description is cool

    Run the show command again to ensure that the cache was updated. Also, check the zip file (lang-

    lang1.zip) and look at the entry for word100410, and ensure that the change was written transparently

    to the backend zip file. You will notice that updates are slow. Unfortunately, java has poor support for

    updating zip files, so we had to completely recreate the zip file for each changed word (this is while

    update takes a while).

  • Load up the whole dictionary for all languages into memory so that we can do some more complex

    queries. This may take up to 30 seconds to load all 10 dictionaries.

    Main> load

    Now, perform a query. Search for words which exist in the language lang1, that are synonymous with

    synonym674

    Main> find lang1 -s synonym674

    Let us do a more complex query. Search for words which exist in the language lang1, that are

    synonymous with synonym674 and synonym1093, but opposite (antonym) of antonym948

    Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243

  • For both of these, you got an UnsupportedOperationException because the query support is hard to

    implement in our local cache. A database and SQL would have made life much easier but we do not have

    access to that for our sample application.

    Hint: This is possible and easy using Coherence.

    Let us try some more commands.

    Look up the languages currently supported in our dictionary

    Main> langs

    Now look up the stats of the currently loaded cache. This should show you the number of words

    currently loaded into the cache for each language.

    Main> stats

  • Explore the application. Understand how it works.

    Now, we will go ahead and explore the application to see how it works under the hood.

    The application is run from the working directory. For each language LANG, there is a zip file in that

    directory called lang-LANG.zip which contains meta-data for each word in the language. Open up lang-

    lang1.zip and look at the contents for familiarity.

  • This working directory is already setup as an Eclipse project. Open up eclipse and import the project into

    your workspace. From the file menu, select Import. From the dialog box that comes up, open the

    General node in the tree menu and select Existing Project into Workspace. Click Next. Click the

    radio button beside Select Root Directory and click the Browse button to select the working

    directory. Select the project oow_hol_coherence and click Finish.

    Hint: You can just run the eclipse.bat script. The working directory is setup as a pre-configured eclipse

    workspace and project.

    You should have the project as below:

  • The java source files all reside under the app directory. Please read through them to get familiar with

    what they do.

    Record.java This encapsulates a word in a language

    Main.java This is the main command line interface that parses the user inputs and calls the

    appropriate commands

    Worker.java This interface allows us decouple the implementation of the cache from the

    application

    LocalWorker.java This implementation of Worker uses a local hash map to store the data, and (tries

    to) manages the computation itself

    Helper.java This contains some shared helper functions

    Metrics.java This is a table model containing the languages and memory used for each language

    within the cache.

    Monitor.java This is a swing UI which regularly gets updated metrics from the cache and displays

    as a Swing table.

    Once you have a good understanding of the application, feel free to go back to the previous section to

    run the lab again and get familiar with the application. The rest of the lab will go a lot smoother once

    you get a hang of what the application is trying to accomplish.

    Update application to run with distributed cache

    Now, the lab gets more interesting. We will walk through a number of steps to update the application so

    it runs against the Coherence Data Grid.

    In this lab, we aim to achieve the following using Coherence

    Store the cached data in multiple external JVMs which look to us as a giant local heap with quick

    access

  • Transparently read the records from the cache, even if they have not been loaded into

    Coherence

    Transparently let the cache persist updates to the backend data source

    Perform queries and computation on the coherence cache

    Configure the coherence cluster so that other users on the network do not conflict with us

    Efficiently make updates with the minimum amount of network hops

    Monitor the application (including the distributed cache) transparently from a single location

    To achieve this, we will understand and leverage the following concepts from Coherence

    Invocable Maps and Entry Processors

    JMX functionality

    Read-Through and Write-Through caching strategies

    Configuration subsystem

    Coherence Services

    Partitioned (Distributed) and Near cache topologies

    First, define an architecture strategy. In general, Coherence supports using a Partitioned cache where a

    number of backups are stored on one or more members in the data grid for failover. In addition, a near

    cache can wrap a partitioned cache, so that a copy of the data is stored on the local JVM for each (so

    network access may be bypassed). We will leverage both mechanisms for our caching.

    The actual words and their metadata are stored in the partitioned (distributed) cache. There will be a

    different named cache used for each language. For example, language lang1 will be stored in the cache

    called dict-lang1. The list of available languages and some other state we need will be stored in the near

    cache (called app-shared). This near cache will actually wrap a different partitioned cache.

  • We will also use a CacheStore so that read requests to the cache will transparently go to the backend

    when a cache miss happens, and updates will also transparently go to the backend zip files when we

    want them to.

    Configuration Subsystem

    Coherence is an extensively configurable system.

    At startup, Coherence will find the file tangosol-coherence-override.xml on your classpath. This

    configuration is for system-wide settings. In this file, you can configure things like your cluster address,

    cluster multicast port, cluster name, etc.

    Copy tangosol-coherence-override.xml from the solutions directory into your working directory. Use the

    configuration for a local machine restricted cluster, and also define a different distributed cache service

    for storing the shared state (e.g. languages).

    Once Coherence gets a request to lookup a Cache, it will load up the coherence-cache-config.xml file.

    You should configure your caches here, setting up those that will use the distributed (partitioned) cache

    separately from those that should use the replicated cache. Copy coherence-cache-config.xml from the

    solutions directory into your working directory, and ensure the following:

    Two separate cache schemes are defined for dictionary (using distributed scheme) and shared

    caches (using near scheme) respectively.

    A naming convention is used where caches with names matching dict-* are mapped to the

    distributed scheme, while the cache app-shared is mapped to the near scheme.

    A cache store is configured which takes the cache name as a parameter. The cache store should

    apply to every dict-* cache alone, since these are the only ones which need to load up or persist

    data to the backend zip files.

    Now that we have gotten the configuration out of the way, let us go about working on the actual

    application.

  • Cache Store

    First, create the CacheStore which does the work of reading data during a cache-miss or persisting data

    after a cache update. We want an explicit update to the cache to write through to the zip files, but not a

    bulk load. This means that every cache put should not write back to the zip files. We can control this by

    using a variable stored in the cache itself. The shared cache (app-shared) will be used to store a variable.

    During a bulk load, we will set the variable in the cache, and remove it once the bulk load is done. The

    cache store will not do a write-through to the backend if the variable is stored. Any other cache puts will

    result in a write-through to the backend. Look at the way app\LocalWorker.java reads and writes

    individual records to/from the zip files. Implement that same logic in app\DistCacheStore.java.

    Hint: Look at the solutions directory for how this is done.

    Distributed Cache Worker implementation

    Next, create the actual implementation of the Worker interface, similar to the app\LocalWorker.java

    implementation, that can interact with Coherence. Call this app\DistWorker.java. In this

    implementation, we need to store all variables in the coherence cache, so that anyone in the coherence

    cluster can access this. This includes:

    languages

    dictionaries (words)

    Implement the following methods, with guidelines below:

    getLanguages store the list of languages in the repl-shared cache and retrieve from there as needed

    getRecord Simply retrieve the word from the cache. Coherence will ensure that it checks the

    backend zip file if it does not have it, since the cache store has been configured.

    update One way to implement this will be to retrieve the Record from Coherence to your local

    JVM, make updates on the local JVM, and then send the updated Record over to

    Coherence. However, this can cause unnecessary network traffic which could be a

    bottleneck if the size of the records is large (especially compared to the change you

    want to make). A more efficient way is to send your updates directly to the coherence

    node that hosts the data. You achieve this using EntryProcessors and the

    InvocableMap.

    In addition, this method must be smart enough to write-through to the backend zip

    files only when requested. This can be done by setting a variable in the cache before

    putting the record in, and the CacheStore will only persist to the zip files if that flag is

    set.

  • bulkInsert For large uploads, it is more efficient to upload to Coherence in batches. Coherence has

    a putAll API which can be used for this.

    find Unlike the app.LocalWorker (where implementing complex queries without a database

    is difficult and un-implemented), Coherence has an extremely powerful query

    functionality which can easily simulate complex SQL queries. This is the Coherence

    Filters API. An added advantage is that Coherence will run your query in parallel across

    all the nodes in the cluster and return your results faster. The more Coherence nodes

    you have, the less data each one holds and the less work it does, meaning that your

    query performance scales linearly with the number of Coherence nodes.

    updateMetrics Coherence has an extensive JMX feature set, where all the management information

    can be federated into any number of nodes in the coherence cluster that you deem

    should hold federated management information for the cluster. We will leverage this

    to keep track of the number of entries in the cache, how these entries are distributed

    in the coherence data grid, and how memory is used across the grid.

    clear This will remove all entries stored in the cache for a given language

    Look in the solutions directory for the full solution for the DistWorker.java implementation.

    Finally, update the command line interface app\Main.java to have the -dist command line parameter

    switch to using the app.DistWorker implementation. Do this by un-commenting the call that

    instantiates DistWorker below.

    public static void main(String[] args) throws Exception {

    Main m = new Main();

    m.worker = new LocalWorker();

    for(int i = 0; i < args.length; i++) {

    if(args[i].equals("-dist")) {

    //m.worker = new DistWorker();

    }

    }

    m.run();

    }

  • Run application with distributed cache

    Compile your application as was done in one of the prior sections, by opening up a cmd shell, setting up

    your environment and running javac.

    Hint: You can run the app-compile.bat script

    Now, start up three coherence cache servers, using the command line:

    java -Xms256m -Xmx512m -Dtangosol.coherence.management.remote=true

    com.tangosol.net.DefaultCacheServer

    Hint: You can just run the coherence-cache-server.bat script. Double-click three times to start 3 servers.

    This will start up a Coherence cache server configured to expose its management (JMX) information

    around the cluster.

  • Now run the application using the distributed cache implementation, using the command line:

    java -Xms256m -Xmx512m -Xmanagement -Dtangosol.coherence.management=all -

    Dtangosol.coherence.management.remote=true -Dtangosol.coherence.distributed.localstorage=false

    app.Main -dist

    Hint: You can just run the app-dist.bat script

    This will start the command line interface to your application, setting that JVM as a coherence cluster

    member which does not store data, but which collects management information from all the coherence

    cluster members and stores it in its JMX MBeanServer. You will thus locally be able to see all the

    management information from the whole coherence cluster within your local JVM.

    Note that once the data has been loaded into Coherence, you can run multiple clients and have them all

    share the distributed in-memory data grid.

    At the prompt, run stats which will pop up a Swing UI which updates itself as the Coherence cluster

    membership and contents change. From this Swing UI table, you can see the membership of the

    coherence cluster and how much of the data each member holds in near-real time (the Swing UI

    updates itself every 5 seconds).

  • Main> stats

    As mentioned above, this swing table updates itself every five seconds, with the JMX information which

    has been federated at your local JVM. Each row in it represents how the amount of data stored by each

    coherence server on behalf of the cluster, and the memory being used in MB. For example, in the

    screenshot above, the coherence JVM with node-id 1 is the primary store for 10089 words of lang2,

    10075 words for lang1, 9924 words for lang3, and uses 114MB of memory. The next rows show how

    much coherence JVMs with node id 2 and 4 hold.

    As you add or shutdown coherence cache server, and even as you bulk-load the records into coherence,

    monitor this Swing UI and see how Coherence automatically distributes the cached data. Look at

    app\DistMonitor.java for the full implementation.

    Also, you can open up JConsole or JRMC to look at the rich set of JMX management metrics exposed by

    Coherence.

    Hint: You can run the jrockit-mc.bat script

  • Once that is done, run the other commands as was shown in the previous section. A sampling of those

    commands is below.

    Main> help

    Main> langs

    Main> show lang1 word100003

    Main> update -d lang1 word100003 this new description is cool

    Main> load

    Main> find lang1 -s synonym674

    Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243

    Main> exit

  • To show the beauty of this distributed solution, run another instance of the client. At the prompt, just

    run the find command. You will see that it works. Any new client does not have to load up the data,

    since all the data and state is stored externally in the distributed cache.

    Hint: You can run the app-dist.bat script

    Test failover and redundancy

    You can test failover and redundancy of your Coherence implementation by shutting down some

    instances and bringing others back up. You can do this by typing Ctrl-C in some of the Coherence Cache

    Server windows, and/or bringing some new Coherence servers back up.

    Hint: You can use the coherence-cache-server.bat script.

    Watch the windows of the coherence cache servers that are live. Notice how the coherence cluster

    automatically rebalances the data. Watch the Swing UI to see the updated stats on how the data is

    rebalanced and memory used in the cache servers.

    Shutdown everything

    To shutdown everything, type Ctrl-C in all your open windows.