© Copyright IBM Corporation, 2012
EPIC/IBM BEST PRACTICES – v2.1
Anita Govindjee – IBM Jean-Luc Degrenand – IBM Christopher Strauss – IBM
February 13, 2012
© Copyright IBM Corporation, 2012
INTRODUCTION 4
A General Description of the Epic Product 4 Tiered database architecture utilizing Caché Enterprise Cache Protocol (ECP) technology 6
THE EPIC HARDWARE PLATFORM SIZING PROCESS 12
A DESCRIPTION OF THE INTERSYSTEMS CACHÉ DATABASE ENGINE 13
GENERAL GUIDELINES FOR STORAGE HARDWARE 14
General Concepts 15
The Use of RAID 16
How Data is processed through the Storage System 17
A Typical Layout of the Epic Production Caché data volumes 18
FlashCopy 20
EasyTier 20
CONFIGURATION GUIDELINES FOR THE DS8000 SERIES ENTERPRISE STORAGE SYSTEM 20
SVC AND THE EPIC STORAGE CONFIGURATION 22
CONFIGURATION GUIDELINES FOR THE STORWIZE V7000 MID-RANGE STORAGE SYSTEM 23
V7000 Configuration ScreenShots 24
CONFIGURATION GUIDELINES FOR THE DS5000 SERIES MID-RANGE STORAGE SYSTEM 28
CONFIGURATION GUIDELINES FOR THE XIV STORAGE SYSTEM 29
CONFIGURATION GUIDELINES FOR THE N-SERIES STORAGE SYSTEM 29
CONFIGURING THE POWER SYSTEMS AIX SERVER 29
POWER7 29
Mounting With Concurrent I/O 30
© Copyright IBM Corporation, 2012
Creation of Volume Groups, Logical Volumes, and File Systems for use by Caché 30
Additional System Settings 32
ADDITIONAL RECOMMENDATIONS 33
What Information Should Be Collected When A Problem Occurs 38
© Copyright IBM Corporation, 2012
INTRODUCTION
Epic is a Healthcare Information System (HIS) provider which delivers a comprehensive
Electronic Medical Recordkeeping System covering all aspects of the Medical Healthcare
Profession. The Epic Solution includes a variety of applications which cover such areas
as Medical Billing, Emergency Room, Radiology, Outpatient, Inpatient and Ambulatory
care.
The Epic product relies almost exclusively on an Electronic Database Management
System called Caché produced by InterSystems Corp.
Epic has two main databases technologies. The on-line transactional production (OLTP)
DB runs Caché as database engine. The analytical DB runs MS-SQL or Oracle. The
analytical DB has the highest bandwidth but the Caché OLTP DB is by far the most
critical to end user performance and consequently is where most of the attention needs to
be focused. This Best Practices guide is therefore centered on the Caché OLTP DB.
A General Description of the Epic Product
There are two fundamental architecture models which Epic uses:
(1) Single Symmetric Multiprocessing (SMP)
(2) Enterprise Cache Protocol (ECP)
The majority of customers are using the SMP architecture. Each architecture has a
production database server that is clustered in an active-passive configuration to a
failover server. The Epic production database server runs a post-relational database
developed by InterSystems Corporation called Caché. The Caché language is a modern
implementation of M (formerly MUMPS), which is a language originally created for
healthcare applications.
© Copyright IBM Corporation, 2012
Functional Layers Of the Epic Architecture
Epic Applications Epic Chronicles
Epic Core Utilities
InterSystems Caché
Filesystems
OSS
Hardware
© Copyright IBM Corporation, 2012
Single symmetric
multiprocessing (SMP) database server
Single symmetric multiprocessing (SMP) database server The single database server architecture provides the greatest ease of administration. The
SMP model today scales well up to the 16 to 24 processor range. Beyond this point, the
ECP model is required.
Tiered database
architecture utilizing Caché
Enterprise Cache Protocol (ECP) technology
Tiered database architecture utilizing Caché Enterprise Cache Protocol (ECP)
technology The tiered architecture retains a central database server with a single data storage
repository. Unlike the SMP architecture, most processing needs are offloaded to
application servers. The application servers contain no permanent data. This architecture
offers increased scaling over the SMP architecture.
© Copyright IBM Corporation, 2012
© Copyright IBM Corporation, 2012
Production database server (Epicenter OLTP) — Caché and chronicle data
repositories (CDR) live here, including clinical, financial and operational data. UNIX
server hardware is clustered (see failover server) and is SAN-attached. The production
database will be replicated to the data recovery (DR) site via Caché Shadow service or
array-based replication.
Failover server — Used only when production has problems; then takes over
functionality of the production database server. The switch from production to failover
happens in minutes. UNIX server has same configuration as production database server
and is connected to same SAN volumes. The cluster software is provided by OS vendors
and triggers when production should failover. Epic scripts are added to the software
scripts for automatically moving the application from production to failover hardware.
Application server (app server) — Caché service is running on these UNIX systems.
User processing load is distributed via content switches across the application servers,
rather than directly accessing the production database server. All permanent data lives on
the database server, but temporary data is created for local activities on the app servers.
App servers can be added or removed from the network for maintenance when necessary.
© Copyright IBM Corporation, 2012
Scaling performance is accomplished by adding additional app servers. App servers
cache block information brought from the database server so network traffic is not
incurred for each request for data. App servers also run ECP (Enterprise Cache Protocol),
which allows the app server to access the production database server directly over
redundant, dedicated GigE networks. If an app server fails, the client (or clients) must
reconnect and restart any unsaved activities.
(Reporting) Shadow Server — Near-real-time database of production or a delayed
mirror of what is in production based on Caché journaling process. Replicated data is
used for off-loading production reporting needs, such as Clarity. Shadow servers can also
be used for disaster recovery purposes rather than host-based or array-based replication.
The shadow server is SAN-attached.
Clarity server OLAP – Oracle or SQL RDBMS storing data extracted daily from
Reporting Shadow database server via Extract, Transfer, Load (ETL) process. The Clarity
server is SAN attached.
BusinessObjects— Windows servers will host Crystal and will run the reports that
connect to the Clarity database. The results of the reports typically are distributed out to a
file server.
HA BLOB / file server cluster — Used more by clinicals to store images, scans, voice
files and dictation files. (Can be stored on same cluster, but some customers wish to
separate them.) The HA file server cluster is SAN attached.
Web server — Connects to either application servers or production database server.
Used for the Web applications: MyChart, EpicCare Link, EpicWeb, etc. Depending on
the service functionality, it is linked to either production app servers or the production
database server via TCP/IP.
Print format server (EPS) — Converts RTF (rich-text format) to PCL/PS and controls
routing, batching, and archiving of printouts.
Print relay server — Can be run on the same server with the print format server. Used
for downtime reporting. Info from here is sent to DR PC where users can access
downtime reports.
Full Client Workstation — x86-based PC that runs the client software (Hyperspace) and
communicates to a production application server using TCP/IP. When you set up the
client on the workstation, there is an EpicComm configuration where you define the
environments (production, training, test and so forth) to which that the workstation can
connect. 1. If you choose not to use Citrix XenApp to present Epic’s Hyperspace client, or if you
require third-party devices that aren’t fully supported through XenApp, you will need
some number of full client workstations. See the Citrix XenApp Farm section for further
details on the tradeoffs between full client and thin client implementations of Epic.
© Copyright IBM Corporation, 2012
2. For each Epic software version, we publish workstation specifications which you can use
to determine whether or not your existing workstations are adequate to run Hyperspace. 3. If you require new workstations, we publish Workstation Purchasing Guidelines which
are reviewed regularly and are expected to exceed Epic’s minimum requirements for the
next several years. The current workstation purchasing guidelines document appears as
an appendix at the end of this document. 4. The number of workstations required will be determined in the early stages of your Epic
implementation. This depends on your facility layout, the number of staff working in a
given area, and the workflows performed in that area throughout the day. As a guideline, most organizations choose to have enough workstations so that one is readily available to
every user at the busiest time of the day, in the user’s preferred work area.
5. Epic Monitor is optional functionality that you may choose to deploy in patient rooms in intensive care settings. Epic Monitor requires Windows Presentation Foundation.
Consequently, Citrix XenApp is not a viable option for it at this time. Epic Monitor has
the following display requirements:
a. 24” touch screen monitor or larger b. Native resolution of at least 1680 x 1050
c. Resistive touch technology, which allows the use of gloves
d. For usability reasons, a stable wall mount is an absolute necessity
6. We require a round-trip network latency of 40 ms or less between full client workstations
and the Epic database server.
Thin Client Terminal – Consists of Citrix XenApp x86 servers 1. If you choose to use Microsoft Remote Desktop Services or Citrix XenApp to present
Epic’s Hyperspace client, you will need some number of thin client terminals. Low end
workstations or dedicated thin client devices work well for this purpose. You may have
existing hardware that meets this need and consequently may not have to purchase new devices.
2. It is important for you to conduct thorough testing of any thin client device that you are
considering for production use. Epic is happy to assist you in your evaluation of such
devices. 3. We recommend against the use of thin client devices with Windows Embedded CE. Our
customers have consistently had poor experience with Windows Embedded CE.
4. The number of devices required will be determined in the early stages of your Epic implementation. This depends on your facility layout, the number of staff working in a
given area, and the workflows performed in that area throughout the day. As a guideline,
most organizations choose to have enough devices so that one is readily available to
every user at the busiest time of the day, in the user’s preferred work area. 5. Epic’s testing has demonstrated that good performance can be achieved with up to 150
ms of round-trip latency between a thin client terminal and a XenApp server running
Hyperspace. If latency exceeds 150 ms, ICA protocol optimizations for high latency conditions should be employed, but may or may not yield acceptable performance.
DR PC: Houses the downtime reports.
CL/EMFI server (community lead / enterprise master file infrastructure) — Community
lead manages and maintains collaboratively built data shared across all instances in the
© Copyright IBM Corporation, 2012
community. Enterprise server is used as a mechanism to move static master files between
deployments in a community (a group of affiliated deployments). A common build /
vocabulary can be distributed across the organization for ease and maintenance and
consistent enterprise reporting. Essentially EMFI is an internal interface broker. Server is
not critical for real-time operations, but is needed to make community configuration
changes. The EMFI server is SAN attached and will use array-based replication to the
DR facility.
© Copyright IBM Corporation, 2012
THE EPIC HARDWARE PLATFORM SIZING PROCESS
IBM does not provide initial sizing of Epic implementations to customers. The primary
reason is that IBM does not have the same sort of information which Epic collects from
their customers prior to the implementation. Before an Epic deployment, Epic dedicates a
considerable set of resources in order to analyze and understand the customer’s
requirements. To accomplish this, a great deal of knowledge about the customer’s
hospital, clinic, or health care environment as well as knowledge about the Epic software
is needed.
Epic will periodically visit an IBM benchmark center for purposes of running multiple
benchmarks using the latest available IBM storage and server equipment. Testing is also
performance on-site at Epic in Verona, Wisconsin with the latest IBM server and storage
systems. During these benchmarks, Epic will test multiple types of simulated customer
loads in a variety of simulated environments. The data which is collected will be used to
generate a generic set of sizing parameters along various performance measurement axes.
These parameters are used to calculate an appropriate sizing for a specific customer,
based on their specific user loads and anticipated use of Epic products.
Epic will provide their customer with a sizing document, which outlines reasonably
specific recommendations regarding the size of server and storage that they recommend
for a particular customer. Next, the IBM account team can then work with the Epic
customer to further refine the exact set of hardware that is required to implement a
production version of Epic. Alterations in configuration may be required based on the
existing customer IT implementation, equipment location, cost considerations etc.
However, we strongly recommend not reducing the Epic recommended basic set of
resources such as CPUs, number of disk spindles, storage cache sizes etc. Experience has
shown that the Epic estimates are reasonably conservative. In addition, we have found
that the growth of future customer hardware resource requirements has always exceeded
original estimates.
It should be noted that Epic provides hardware sizing information with the assumption
that Epic will be the only application using the proposed configuration. Epic maintains
very strict requirements regarding response times from the database system with regards
to retrieval of data. The data is essentially 100% random access. This requires fast disk
access with little opportunity to take advantage of I/O read-aheads or the optimal use of
storage cache. Our recommendation is that both the server and storage components not be
shared with other applications. The difficulty with a shared system materializes when
trying to diagnose a performance issue in a production system. In addition to unplanned
loads associated within the Epic environment, trying to identify unplanned load
excursions presented by non-Epic applications further complicates the process of
diagnoses.
© Copyright IBM Corporation, 2012
Initially, the IBM account representative must request a copy of the sizing guide provided
to the customer by Epic. This document will provide everything that will be needed as far
as a working hardware configuration. The IBM account representative or business partner
can communicate with the Epic IBM Alliance team via this email: [email protected],
and can send the sizing guide copy through this email.
A DESCRIPTION OF THE INTERSYSTEMS CACHÉ DATABASE ENGINE In order to better understand the Epic environment, we need to examine the underlying
database system which is used by Epic. The main reason is that Caché interfaces directly
with the IBM hardware. Virtually, the entire Epic environment relies on Caché in order to
accomplish its work. The Caché Database engine is based on a B-Tree data storage
structure. Internal to Caché, the user data is managed as 8K byte blocks. Caché maintains
a “Global Buffer” cache in the computer’s real memory. All transactions (read and write)
between the user and the data base are read in or written from the Global Buffer. The
Global Buffer acts as a storage cache thus reducing OS level I/O requests. It also acts as a
global data locking communication system which provides mutually exclusive access to
data being referenced or changed by multiple users.
Data being referenced by one or more users will initially be read from the storage device
(disk), into the Global Buffer. The data objects are now accessible for repeated operations
including updates to the contents. Access and updates happen rapidly as the data is kept
in RAM.
As data blocks in the Global Buffer are updated they are considered “dirty”. Caché will
“flush” the “dirty” blocks at a regularly scheduled interval of about eighty seconds, or
when the percentage of dirty blocks over total global buffer exceeds the internal
threshold, whichever comes sooner. Caché has dedicated write daemon processes that
perform the actual update operations. We typically reference the “flush” of the dirty
blocks as a “write daemon cycle”.
Caché uses two-phase update technique with database updates. During the write daemon
cycle, updates are first written to CACHE.WIJ file. The updates to the actual database
file only happen after the WIJ updates have completed successfully. After all database
updates are committed, Caché will go back and mark the WIJ clean. The first WIJ writes
are sequential writes of 256 KB blocks.
While the Global Buffer is not being flushed, the I/O requests issued by Caché are strictly
read I/O in nature. This does not include the Caché Journal File which is being written
out to disk in a sequential manner. Writing to the Journal File is a continuous process, but
does not require a large number of resources. Therefore the random read operations can
occur with little or no interference.
© Copyright IBM Corporation, 2012
This is not the case during the “flush” or “write burst” which is initiated every eighty
seconds. While Caché continues to issue 100% read requests, the DB engine also
generates a large quantity of write requests in a very short amount of time. Epic has strict
read latency guidelines to avoid degrading user performance. Write latencies also become
increasingly important for high-end scalability. For large implementations this can lead to
a clear conflict of requiring optimal read performance while at the same time demanding
optimal write performance during intense write bursts.
The reality is that no storage system can complete both 100% reads and 100% writes
simultaneously. The performance metric which EPIC uses to determine adequate user
response time is the time required for a read request to complete. The acceptable
threshold is 15ms or less; that is, the interval of time required between the time a request
has been generated and the time that the I/O request returns control to the user with the
requested data available to the user.
Caché also keeps a time-sequenced log of database changes, known as Caché journal
files. Caché journal daemon writes out journal updates in a sequential manner every two
seconds, or when a journal buffer becomes full, whichever happens sooner. The amount
of journal updates is insignificant compared to the amount of database updates during
each write daemon cycle. Therefore we normally consider the IO operations mostly
random read operations when the write daemons are not active.
Overall, the IO access pattern of Epic/Caché system is expected to consist of
continuously random read operations topped with 80-second interval write bursts. In
order to meet Epic’s response time requirements, the read service time measured at
application level needs to be 15 ms or less in an SMP configuration and 12ms or less in
an ECP configuration.
Without adequate storage resources and diligent configuration of these resources, the read
response time will degrade during the “write burst” period. Read response times which
exceed 15ms will be perceived by the end user as an unacceptable delay in overall
performance. Depending on how under-configured or incorrectly tuned a storage system
is, response times as slow as 300+ms have been observed.
The information provided in the next sections will outline steps needed to mitigate slow
read response times. This document will not provide information regarding the makeup
of the IBM storage system family. However, we will provide references which will
furnish complete background information about IBM storage systems in general, or
provide details about specific hardware which is addressed in this document.
GENERAL GUIDELINES FOR STORAGE HARDWARE
© Copyright IBM Corporation, 2012
General Concepts
Much of today’s storage technology was designed to solve two general problems: (1)
safe, redundant and recoverable storage of large amounts of data and (2) rapid retrieval of
the stored data. An assumption regarding the access of the data is that reading and
updating of the information would occur at relatively constant rates. For example, data
would be read from the storage system about 70% of the time, and written to the storage
system about 30% of the time on a fixed basis. This ratio can vary widely depending on
the end-user application.
In the case of Caché, the application reads exclusively for 100% of the time. Following
an 80-second interval, in addition to the requests for data, a large set of write requests to
storage are introduced. This “burst” can consist of several hundred megabytes of 8K data
blocks which must be written twice: once to the Caché WIJ and then to the actual random
access data base.
Most storage subsystems are not really optimized for this type of “burst” I/O behavior.
Moreover, it was assumed that the ratio of reads to writes would remain relatively
constant across a fixed period of time. Most of the best practices have assumed these
constant ratio read/write conditions. In the case of Caché, the storage system must first
operate in a “read-only” mode, followed next by a simultaneous, “read-only” and “write-
fast” mode. This cycle is repeated every eighty seconds.
If the cache associated with the storage system is not large enough and becomes rapidly
filled with data to be written to disk, the storage system’s algorithm will direct the storage
controller to “de-stage” the write cache. This operation supersedes the priority of any
read requests, which are being processed.
The two storage resources which can most often limit the read performance are (1)
storage cache and (2) the ratio of number of disk spindles to total user data. The greater
the number of spindles that are available during a write operation, the faster the writes to
physical disk can be completed. Associated with the number spindles is the cache size
available to the storage system. Data which is destined to be written to physical disk must
wait in the write cache until the physical disk resources become available.
The most significant limiting factor across all of the storage system components is the
physical disk. Reading or writing to the disk is limited by the rotation speed of the platter
and the time required to start and stop the read/write head movement. No matter what
other factors are considered to maximize throughput, the wait time for an I/O request to
be serviced by the disk will ultimately determine overall response time.
Most first-time Epic users will want to fill the existing disk arrays to their maximum
available capacity. This is especially true since under RAID 10, half of the spindles are
already used simply to mirror the user data. Thus from the end-users perspective, only
half of the spindles are available. Completely filling the useable RAID 10 formatted disk
© Copyright IBM Corporation, 2012
space translates into more data that must be accessed by the single read/write head on an
individual spindle.
New technology such as Solid State Drives (SSDs) which are not limited by the rotation
speed of the drive can handle the load with even RAID 5. With SSDs, it allows us to
leverage new technologies such as EasyTier or advanced caching on XIV.
Another problem with completely filling the disks is that, at the logical volume storage
level, certain JFS metadata information must be retained on the same disk volumes. The
metadata includes journal logs which allow the filesystem to recover from a logical
volume failure. To protect the JFS metadata, it is advisable not to exceed 97% of the disk
capacity for this reason alone.
Rapid database growth is the norm and to be expected in a health care setting. Epic
therefore sizes for three years of growth when providing the hardware sizing guide.
Unexpected addition of new patients or patient data can rapidly consume large amounts
of reserve capacity.
For all of these reasons, it is recommended that the physical disk requirements do not
exceed 60-70% utilization over the three years.
The Use of RAID
The striping of logical data across multiple spindles is an obvious way to evenly
distribute the load of all available disks. Consider ten simultaneous requests for data. If
all of the data was located on only one platter, then nine of the ten requests would remain
queued while the first request was being serviced. Each read or write requires the disk to
rotate and the read write head to move to a new position. All of this is completed
sequentially. Now consider the same data spread or “striped” evenly across ten disk
platters. The ten read requests can be serviced in parallel. Now the only limitation is the
latency required to move the data from the storage cache to the requesting server. This
time can be measured in units of hundredths of milliseconds. Depending on the type of
disk drive, a physical read operation can consume between 3 to 5 milliseconds.
The Caché I/O requests to the production data files are about 99% random in nature. This
means that for almost every I/O request, the read/write heads will perform a “seek”
operation. If the data were sequential, the read operation would require little or no “seek”
operations. As one block of data is written, the read/write head will most likely be
already positioned to write the next block.
One method of striping is the use of RAID (Redundant Array of Independent Disks).
RAID not only provides data striping for faster disk access but provides loss of data
protection as well. The two most widely used types of RAID are 5 and 1+0 or 10. Based
on testing done with EPIC we have determined that RAID 10 provides better
performance compared with RAID 5. There are documented reasons why RAID 10 is
© Copyright IBM Corporation, 2012
superior to RAID 5 particularly when multiple random writes of small blocks are
required. The following reference provides additional details about RAID types and their
respective performance:
http://www.redbooks.ibm.com/abstracts/sg247146.html?Open
RAID10 provides data redundancy by way of mirroring each disk. If one disk fails, a
duplicate copy of that disk will provide the same data. When the failed disk is replaced,
the system will rebuild the new platter with a copy of the data located on the mirrored
drive.
Besides striping at the storage level using RAID, striping is also done at the SVC level
and at the Logical Volume Level as well. These striping methods will be covered in later
sections.
How Data is processed through the Storage System
There are multiple logical and physical ‘stages’ within a storage system that data must
pass through. These stages include:
(1) The physical disk drives, where data is actually written or read. The drives are
arranged in groups of 16 disk units within a physical tray. We are currently
recommending the use of 146GB drives or smaller. However, future disk density and
access speed technology may allow for larger capacity drives.
(2) The RAID Array, in this diagram RAID 10 is depicted. Each array consists of 8 disks
from an array site. The RAID 10 Array will consist of either 4 + 4 or 3 + 3 + 2 spares.
We especially recommend using 4 + 4 ranks for production on the DS8 storage.
(3) The strip size used for the RAID 0 portion of the RAID 1 + 0 or RAID 10 is 128KB
(4) The stripe size is 1MB or (128KB strip * 8 disks)
(5) There are N sets of 8+8 disks which make up a RAID 10 rank. The 8+8 array can be
split between the Epic “prod” and WIJ volumes into 6+6 and 2+2 respectively. The 6+6
consists of disks from one RAID 10 array and the 2+2 consists of disks from another
RAID 10 array
The extent size is typically set at 1GB. Multiple extents are used to create a LUN.
(6) Depending on the size of the production database multiple LUNs should be created.
The minimum number is two and the maximum recommendation is 32 LUNs.
© Copyright IBM Corporation, 2012
(7) The storage cache size on the DS8K is a minimum of 32GB per controller. This value
may change depending on the model of the storage system and the total size of the Epic
database being used. For the DS8000 series storage systems, 1/32 of the total storage
cache is dedicated for write I/Os. Therefore, a sufficient total amount of cache must be
available to insure that the correct amount of write cache can handle the data coming
from the Cache write burst.
These values are the “typical” recommended configuration for a standard Epic
installation. The values, however, may vary based on recommendations made by Epic or
depending on the total size requirements of the database.
Based on empirical evidence these values seem to provide the best overall performance.
A Typical Layout of the Epic Production Caché data volumes
Although there are any number of ways to configure the LUNs for use by the Caché DB
product the following configurations seems to provide acceptable results:
The Caché production database file systems/logical volumes, prd01-prd08, should be
spread across as many ranks as possible. These ranks should be made up of spindles from
multiple and diverse RAID10 arrays. Selection of the arrays should be evenly distributed
across both storage system controllers as well as fiber channel adapters. The WIJ should
also be allocated from disks belonging to the same arrays as the disks used for
production. This keeps the WIJ volume “spread” across multiple spindles as much as
possible.
The WIJ and the production volumes are accessed at separate times. There will be no
simultaneous contention from the WIJ and the production data for the ranks at any time
during the “write burst” process.
The Transaction Journal should be created from separate ranks from the database
volumes to give extra protection against disk array failure. In the event that the
production database arrays experience a catastrophic failure, the journal files will be used
to recover any lost transactions.
Here is a sample schematic representation of the disk layout for a DS8K system:
© Copyright IBM Corporation, 2012
Below is an example of the commands to set up the volume groups, volumes and file
systems for a single Epic instance: mkvg -f -S -s 16 -y epicvg1 hdisk13 hdisk14 hdisk15 hdisk16
mklv -a e -b n -y prlv11 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv12 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv13 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv14 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv15 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv16 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv17 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y prlv18 -e x -w n -x 35192 -t jfs2 epicvg1 35082
mklv -a e -b n -y wijlv1 -e x -w n -x 800 -t jfs2 epicvg1 795
crfs -v jfs2 -d prlv11 -m /epic/prd11 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv11 /epic/prd11
crfs -v jfs2 -d prlv12 -m /epic/prd12 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv12 /epic/prd12
crfs -v jfs2 -d prlv13 -m /epic/prd13 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv13 /epic/prd13
crfs -v jfs2 -d prlv14 -m /epic/prd14 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv14 /epic/prd14
crfs -v jfs2 -d prlv15 -m /epic/prd15 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv15 /epic/prd15
crfs -v jfs2 -d prlv16 -m /epic/prd16 -A yes -p rw -a logname=INLINE –a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv16 /epic/prd16
crfs -v jfs2 -d prlv17 -m /epic/prd17 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv17 /epic/prd17
crfs -v jfs2 -d prlv18 -m /epic/prd18 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010
RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 5555
RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010
RAID RAID RAID RAID 5555 RAID RAID RAID RAID 5555
RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010
RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 5555
RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010
RAID RAID RAID RAID 5555 RAID RAID RAID RAID 5555
RAID RAID RAID RAID 5555
Journal F ile Disk
FlashCopy Disk
Database F ile Disk
DS8100 Hot Spare Disk
DA
PA
IR 2
DA
PA
IR 0
DA
PA
IR 3
RAID RAID RAID RAID 10101010
© Copyright IBM Corporation, 2012
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv18 /epic/prd18
mkdir /epic/prd1
crfs -v jfs2 -d wijlv1 -m /epic/prd1 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio
mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/wijlv1 /epic/prd1
And here is an example of the resulting filesystem layout from running the commands
above:
/dev/prlv11 1293778944 86438760 94% 8 1% /epic/prd11
/dev/prlv12 1293778944 86438640 94% 8 1% /epic/prd12
/dev/prlv13 1293778944 86438512 94% 8 1% /epic/prd13
/dev/prlv14 1293778944 86438568 94% 8 1% /epic/prd14
/dev/prlv15 1293778944 86438568 94% 8 1% /epic/prd15
/dev/prlv16 1293778944 86438680 94% 8 1% /epic/prd16
/dev/prlv17 1293778944 86438936 94% 8 1% /epic/prd17
/dev/prlv18 1293778944 86438776 94% 8 1% /epic/prd18
/dev/wijlv1 26050560 21903352 16% 6 1% /epic/prd1
/dev/prlv11 /epic/prd11 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv12 /epic/prd12 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv13 /epic/prd13 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv14 /epic/prd14 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv15 /epic/prd15 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv16 /epic/prd16 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv17 /epic/prd17 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/prlv18 /epic/prd18 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
/dev/wijlv1 /epic/prd1 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE
For more information please refer to Epic’s File System Layout Recommendations
document.
FlashCopy Except for XIV, FlashCopy is used for creating point in time copies of the production
database. The Caché db writes are momentarily suspended while the FlashCopy
command completes. We recommend using incremental FlashCopy. The target drives
for the FlashCopy can be different than the source drives. For example, 15K vs 10K,
RAID5 vs RAID10 differences are acceptable. However SATA (or nearline) drives are
not recommended.
EasyTier EasyTier can be used within an Epic production environment, but the customer must
continue to use at least the number of 15K RPM spindles recommended by Epic
Hardware Configuration Guide.
CONFIGURATION GUIDELINES FOR THE DS8000 SERIES ENTERPRISE STORAGE SYSTEM
The following section provides specific details regarding the configuration of the DS8000
Storage System. The description of the storage layout in Section IV was intended to be a
“generic” starting point.
© Copyright IBM Corporation, 2012
When configuring an IBM DS8000 series storage unit, it is important to have production
data LUNs from multiple ranks that are then assigned to different controllers. This is
needed to work around the 25% NVS cache per rank limit. The ranks should be divided
as evenly as possible between the available DA (Device Adapter) pairs.
With the 15000 rpm disks configured in RAID 10, the DS8000 will present 2 types of
arrays: 4+4 or 3+3+2s, depending on the number of disk per Device Adapter. The 4+4
arrays seem to have slightly better performance. We recommend using only the 4+4
arrays for the Epic production volumes. The 3+3 arrays can be used for shadow as well as
other non-production activities.
We recommend using one extent pool per rank for the DS8000 to simplify the
management. The striping of the LUNs will be done at the AIX (LVM) level.
When creating the Extent Pool, they need to be spread evenly on both “servers” (internal
Controllers) of the DS8000.
As a general rule, we always recommend using 4+4 arrays if possible on the DS8000
storage for the Epic production instance.
If a Multiple-Array Extent Pool is required, it is preferable to create the extent pool with
as many 4+4 arrays as possible. A minimum of two extent pools is required. These two
extent pools should be associated with the two controllers.
Volume groups should be created from one LUN per Extent Pool in the DS8000, in order
to spread every AIX Logical Volume across every AIX physical volume in the Volume
Group.
When the DS8000 is shared with other applications than EPIC, The EPIC Production
database arrays should be put on their own Device Adapter and not share the Device
adapter with other applications if possible.
FlashCopy is mandatory for the nightly backup. It is strongly recommended to use
Incremental FlashCopy. If you need to have more than one Incremental FlashCopy to
create the daily “support” database (for example), it is possible to do an incremental
FlashCopy from the EPIC Reporting Shadow database. Please contact
[email protected] for more information.
The FlashCopy repository does not require the same types of spindles or geometry as the
source disks. For example, RAID 5, 10K RPM drives could be used for the FlashCopy
repository instead of higher-performance drives.
For optimal performance of the Epic production environments, it is best to have at least 4
fibre channel ports on the DS8000 connected to a minimum of four HBAs on the server
per production instance.
© Copyright IBM Corporation, 2012
Since mid-November 2011, the “Epic optimization package” is available which
significantly improves the performance of the DS8000 by reducing the peak read IOs.
This applies to the following code on the DS8100/DS8300: R4.3 or higher, DS8700 R5.1
or higher, DS8800 R6.1 or higher. Please contact your IBM representative for the
process to obtain the Epic optimization package.
SVC AND THE EPIC STORAGE CONFIGURATION
SVC has been found to have measurable benefits when included in a hardware
configuration which supports the Epic software environment. Tests have shown that the
SVC will not impact the overall Epic storage performance while providing all the
functional benefits of an SVC. This includes the Storwize V7000 in Gateway mode.
Following are some important guidelines to consider for a hardware configuration which
includes SVC. (See Figure I)
1. We recommend using an even number of vdisks for Epic OLTP production data
and balancing the vdisks between the controllers to balance the load between the
controllers.
2. When using SVC FlashCopy to create backup, it is essential to configure the
FlashCopy to be incremental and to tune the background copy rate from the
default 50% to a value that will allow the background copy to finish in time
without impacting production performance during the backup window.
a. At the time of the FlashCopy resynchronization, FlashCopy will use the
write cache as well. Therefore, it is important to take it into account for the
overall cache sizing.
b. The incremental FlashCopy target should follow the same rules as the
production database to use the maximum available cache. Specifically it
should be part of at least 4 mdiskgroups.
c. During our testing, the copy write between the range of 50-70% with the
default being 50, combined with FlashCopy incremental gives the best
balance between FlashCopy copy time and production disk IO response
time. The best value that we have found seems to be 65.
3. As mentioned previously, when the SVC is shared with storage systems that are
non-Epic related, we recommend dedicating one IOgroup (pair of nodes) of the
cluster to the EPIC Production MDisks, and assign the rest of the load to the
remainder of the cluster (the SVC supports up to 4 IOgroups).
4. Because of the Write Cache Partitioning feature (which prevents cache
starvation), the SVC will not allocate more than 25% of the Write cache per
Storage Pool (mdisk group) if there is more than 5 mdisk groups in the system. To
get access to the full Write cache of the IOgroup we recommend creating at least
© Copyright IBM Corporation, 2012
four StoragePools (Mdisk Groups). We recommend production OLTP data to be
spread over at least 4 mdiskgroups to allow access to the full write cache.
5. When the SVC is used in conjunction with DS4/5000 series storage, the
DS4/5000 write cache should be entirely disabled. Tests have determined that
having the storage cache and the SVC write cache enabled results in poor
response times both for reading and writing I/O rates.
WITH SVC (or V7000 in Gateway mode)
WITHOUT SVC
DS4000 Series / DS5000 Disable Storage Write
Cache for the Production
Volumes
Storage Write Cache set to 5% Lower and 5% Upper
FIGURE I – SVC and Storage Configuration
6. Please note, that the SVC has been tested with the Epic production db for all IBM
supported platforms. The cache should now be on for both the SVC and the
DS8000-series.
7. It’s important to tune queue_depth setting for each hdisk according to the SVC
performance guide, especially when using relatively large size VDISKS at SVC
level.
CONFIGURATION GUIDELINES FOR THE STORWIZE V7000 MID-RANGE STORAGE SYSTEM
The Storwize V7000 is IBM’s mid-range storage system that uses a similar technology
base as the IBM SVC, thus all the SVC mdiskgroup considerations apply to the V7000.
The IBM Storwize V7000 offers internal solid-state drives (SSDs), 15K RPM and 10K
RPM Small Form Factor (SFF) drives. IBM offers a 300 GB capacity for their 15 K RPM
small form factor drives.
1. Depending on the Epic requirements we may need to use more than one canister
controller for the Epic load. If the Epic OLTP production workload requires more
than 32-spindles, we recommend you dedicate a V7000 controller (two canister
controllers) for the OLTP production workload. If the production workload
requires less than 32-spindles, it may be OK to share the V7000 controller with
other non-aggressive application workloads. Please check Epic’s V7000
Configuration document for more information on Epic’s requirements.
2. Please note that the previous SVC section applies fully to the V7000. Please refer
to the SVC section for the V7000.
© Copyright IBM Corporation, 2012
3. We recommend setting the FlashCopy grain size to 64KB when using SSDs for
the production OLTP data.
4. Storwize V7000 with 15K and 10K RPM SAS drives
a. Epic recommends 15K RPM drives for production storage and have live
production experience with 15K RPM drives. 15K RPM (but not 10K
RPM) drives provide the level of performance required for Epic
production. However for non-production Storwize V7000 10K RPM
drives are acceptable.
5. The Storwize V7000 offers easy configuration tool with the GUI (Wizard) the
array for the EPIC Production Database should be configured as RAID 10.
6. The spare disks do not have to be created with the array but need to be added once
all the arrays have been created. By this method, you can control the location and
the number of spare disks.
7. Just like the SVC it is recommended to use at least 4 Storagepools (mdiskgroups)
if the number of disks permits having access to 100% of the Write cache. EPIC
provides a cache requirement for the production database. So if the Write cache is
sufficiently large enough, then additional storagepools may not be needed.
8. We have noticed that it is easier to manage groups of 4+4 Raid 10 arrays. This is
not mandatory.
9. Storwize V7000 with SSDs (Solid State Drives)
a. Although SSD performance significantly better than spinning drives,
testing has shown that the write cycle length can be the limiting factor for
performance on SSDs. As a general rule of thumb it is possible to replace
six spinning drives with a single SSD if capacity permits. Additional
SSDs are not expected to significantly change the write service times.
b. RAID 5 is recommended for SSDs rather than RAID 10 due to the cost-
performance benefit of SSDs over spinning drives.
V7000 Configuration ScreenShots The IBM Storwize V7000 GUI interface provides an array configuration wizard with
logic that ensures new RAID arrays are created using appropriate candidate disks that
will provide best performance and spare coverage.
© Copyright IBM Corporation, 2012
Figure 1 – Storwize V7000 System Status
© Copyright IBM Corporation, 2012
The following example shows the 6 mdisks that comprise a 48 disk, RAID10 SAS array:
Figure 2 - Sample mdisk config
© Copyright IBM Corporation, 2012
A view next of the storage pool, comprised of the above 6 mdisks:
Figure 3 - Sample Storage Pool config
© Copyright IBM Corporation, 2012
Finally, here is a view of the 8 logical volumes (LUNs) that have been mapped to our AIX server. These LUNs were added to a common Epic volume group (VG) and divided into 9 filesystems against which our database and WIJ simulations were executed.
Figure 4 - Sample LUN config
New RAID arrays on the IBM Storwize V7000 were created using the interactive
interface. The Storwize V7000 includes the capability to build optimal arrays through
wizard driven array definition panels. The array configuration panels will select the
drives most suited to your storage requirement based on the settings you choose. While
there is no need to manually configure the storage to guarantee a balanced RAID array,
there is still the option to create arrays from the graphical interface or using the command
line interface.
CONFIGURATION GUIDELINES FOR THE DS5000 SERIES MID-RANGE STORAGE SYSTEM (Note: These configuration guidelines apply to the DS4000 series also)
The DS5000 Series Storage System is sufficiently different from the DS8000 such that
some additional consideration must be made in order to obtain the best possible
performance from this mid-range system. The following section provides specific details
regarding the configuration of the DS5000 Storage System. As was mentioned in section
© Copyright IBM Corporation, 2012
V., the description of the storage layout in Section IV was intended to be a “generic”
starting point.
The write cache flush should be set at 5% maximum and 5% minimum.
When using IBM SVC to manage IBM DS5000 series storage units, the optimal results
were achieved by disabling write cache (not the read cache) at the DS5K unit level for
the LUNs that will be used to hold the database files only and use read/write cache at
SVC level for the VDISKs that were constructed from those DS5K LUNs.
CONFIGURATION GUIDELINES FOR THE XIV STORAGE SYSTEM XIV Storage may be used for non-production purposes within the Epic
environment. This includes activities such as testing, training and shadow servers which
are not being used for production purposes. XIV storage provides a low cost large
capacity data retention facility. However, it is not intended to provide the low latency
response characteristics required within the Epic production environment.
The latest XIV technology, XIV Gen3, is also available for Epic non-production
purposes. XIV Gen3 with SSD technology is planned to be available for Epic production
instances in the future. Preliminary IBM-internal testing of XIV Gen3 with SSDs
demonstrates acceptable performance for the Epic production environment.
CONFIGURATION GUIDELINES FOR THE N-Series STORAGE SYSTEM Please refer to Epic’s NetApp best practices for the IBM N-Series storage system.
The IBM N-Series system is similar to the NetApp storage technology.
CONFIGURING THE POWER SYSTEMS AIX SERVER
In order for the Power Systems AIX server to provide optimal performance when running
the Epic software, a specific set of changes to the default set of system tunables is
required. These system parameters have been tested with the Epic software under many
differing scenarios, which Epic Systems feels would be encountered in typical situations.
POWER7 If the Epic Hardware Configuration Guide specifies less than or equal to 14 cores, this
section does not pertain.
The inter-CEC L3 cache-to-cache communication on POWER7 limits the high-volume of
lock management that the InterSystems Caché database employs. An Epic instance that
© Copyright IBM Corporation, 2012
uses more than 16 cores and crosses a CEC boundary, will encounter performance issues
due to the L3 cache-to-cache communication latency.
If using a 795 Power Systems server for the Epic production instance, please refer to
IBM’s 795 Cross-Book Guidelines whitepaper.
The 750 server consists of a single 32-core CEC (or book). Therefore the p7 L3 cache
latency is not an issue and Epic can therefore scale up to 28 cores.
Mounting With Concurrent I/O The primary change to a default AIX system is invoking the use of concurrent I/O or
CIO. By default AIX uses the JFS2 filesystem. CIO bypasses the caching features which
are enabled within JFS2. The principle reason for disabling JFS2 cache is because the
Caché DB application is already caching needed data blocks. Caché determines what data
needs to be written to permanent storage and what data should remain in the Caché global
buffers. Having the JFS2 cache also making this determination will typically cause
unnecessary extra work to be performed by the system. In addition the JFS2 cache
requires real memory which could otherwise be used by the Caché global buffer.
CIO is invoked via the –o cio mount option. This option should be used on the database
only file systems, typically /epic/prd01 – /epic/prd08. These filesystems host the
CACHE.DAT files which are exclusively random access in nature. The Caché Write
Image Journal should be mounted with the default JFS2 mount options.
Creation of Volume Groups, Logical Volumes, and File Systems for use by Caché
Following are the steps necessary to create and mount the volumes which will host the
Epic data volumes. It is assumed that the storage LUNs which correspond to the volumes
have already been created either via the storage system or the SVC if available.
Step 0. Make a top level root directory for the Epic/Caché
EXAMPLE:
mkdir /epic
mkdir /epic/prd01
Step 1. Create the Volume groups
EXAMPLE:
mkvg -S -y epicprvg -s 16 hdisk1 hdisk2 hdisk3 .....
© Copyright IBM Corporation, 2012
Step 2. Create the Logical Volumes
EXAMPLE:
mklv -a e -b n –e x –t jfs2 -y prdlv01 epicprvg 10G hdisk1 hdisk2 hdisk3 .....
mklv -a e -b n –e x –t jfs2 -y prdlv02 epicprvg 10G hdisk2 hdisk3 hdisk4 ..... hdisk1
Step 3: Create the File Systems
EXAMPLE:
crfs -v jfs2 -d prdlv01 -m /epic/prd01 -A yes -a logname=INLINE –a options=cio
(crfs -v jfs2 -d prdlv -m /epic/prd -A yes -a logname=INLINE –a options=rw)
Step 4: Mount the File Systems
EXAMPLE:
mount /epic/prd
mount /epic/prd01
Step 5: Check that the appropriate entries and options are added to /etc/filesystems.
These steps should be repeated for the eight production volumes, the WIJ and the Journal
files. The WIJ should share the same LUNs as the production volumes. The Journal file
should utilize a separate set of LUNs under a separate volume group.
When the volumes are mounted the results from the mount command, the df command
and the path command should resemble the following:
# df /epic/prd0*
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/prdlv01 78577664 1995268 98% 12 1% /epic/prd01
/dev/prdlv02 78577664 1995268 98% 12 1% /epic/prd02
/dev/prdlv03 78577664 1995260 98% 12 1% /epic/prd03
/dev/prdlv04 78577664 1995264 98% 12 1% /epic/prd04
/dev/prdlv05 78577664 1995292 98% 12 1% /epic/prd05
/dev/prdlv06 78577664 1995280 98% 12 1% /epic/prd06
/dev/prdlv07 78577664 1995376 98% 12 1% /epic/prd07
/dev/prdlv08 78577664 1985984 98% 12 1% /epic/prd08
© Copyright IBM Corporation, 2012
Additional System Settings
Following are the recommended changes to a subset of the AIX system tunable. A brief
description of the reason for the change is included.
# vmo
vmo -p -o lru_file_repage=0 -- Determines which type of pages are replaced during a
paging operation, based on file repage and computational
re-page values.
vmo -p -o maxclient%=90 -- Specifies that the number of client pages cannot exceed
90% of real memory
vmo -p -o maxperm%=90 -- Specifies that the number of file pages should not exceed
90% of real memory
vmo -p -o vmm_mpsize_support=0 -- Use 4K memory pages only
# ioo
Required ioo parameters
ioo -p -o lvm_bufcnt=64 – Specifies the total Logical Volume Manager buffers.
ioo -p -o sync_release_ilock=1 -- Allows inodes to be unlocked after an I/O operation
update.
ioo -p -o numfsbufs=4096 – Sets the number of available file system buffers
ioo -p -o pv_min_pbuf=4096 – Specifies the minimum number of physical I/O buffers
per physical volume
These j2_xxx settings improve the performance for JFS2 filesystems (optional)
ioo -p -o j2_dynamicBufferPreallocation=256 -- Specifies the number of 16k slabs to
preallocate when the filesystem is
running low of bufstructs.
ioo -p -o j2_maxPageReadAhead=2 -- Specifies the maximum number of pages to be
read ahead when processing a sequentially
accessed file on Enhanced JFS.
ioo -p -o j2_maxRandomWrite=512 -- Specifies a threshold for random writes to
accumulate in RAM before subsequent pages are
flushed to disk by the Enhanced JFS's write-
behind algorithm. The random write-behind
threshold is on a per-file basis.
© Copyright IBM Corporation, 2012
ioo -p -o j2_minPageReadAhead=1 -- Specifies the minimum number of pages to be
read ahead when processing a sequentially
accessed file on Enhanced JFS.
ioo -p -o j2_nBufferPerPagerDevice=2048 -- Specifies the minimum number of file
system bufstructs for Enhanced JFS.
ioo -p -o j2_nPagesPerWriteBehindCluster=2 -- Specifies the number of pages per
cluster processed by Enhanced JFS's
write behind algorithm.
ioo -p -o j2_nRandomCluster=1 -- Specifies the distance apart (in clusters) that writes
have to exceed in order for them to be considered as
random by the Enhanced JFS's random write behind
algorithm.
#additional required parameters
lvmo -v epicrdvg -o pv_pbuf_count=4096 -- Increase the number of PV buffers for the
production volume group
chdev -l hdisk5 -P -a queue_depth=64 -- Sets the hdisk depth queue to 64 (default is
20)
chdev -l sys0 -a maxuproc=32767 --- Sets the maximum processes per user to 32767
ADDITIONAL RECOMMENDATIONS
1. Boot from SAN
Boot from SAN is not recommended when running the Epic environment:
Both Caché and PowerHA depend on the O/S running correctly. If a SAN failure occurs
such that the O/S can no longer communicate with the rootvg volume, even for a brief
interval, the condition of the O/S is suspect. The system may appear to be operating
correctly. However, if any O/S specific data was lost during transfer between RAM and
disk, the O/S is no longer viable. Since all software running on the system depends
entirely on the O/S, end user products or supporting middleware may no longer function
correctly.
Epic recommends that customers do not boot from SAN so that Epic can log into the
system following a failure to troubleshoot. However, PowerHA 7.1 recommends that the
customer boots from SAN partly because of the Live Partition Mobility feature. The
decision to boot from SAN should be discussed with your Epic representative.
2. PowerHA (formerly known as HACMP)
© Copyright IBM Corporation, 2012
There are multiple resources for information regarding the best method of configuring a
PowerHA (i.e. HACMP) failover cluster. Epic will provide their customers with
PowerHA callable scripts which contain the necessary instructions to cleanly shut down
and start up the Epic and Caché environment.
Most IT system administrators view PowerHA as being capable of recovering from any
and all events that could occur to an Epic environment. As much as we would like to
imagine such a safety mechanism, it doesn’t exist.
What PowerHA will do:
Recover from any type of real hardware failure. This includes the servers, switches, disk
systems and any other type of device which could experience a physical failure due to
power loss, electronic component failure or a catastrophic event.
What PowerHA will not do:
Recover from user errors, either intentional or accidental. Since PowerHA depends on the
operating system, it is assumed that if the operating system started running the Epic
environment without a problem, it should continue to support the environment without a
problem. There are two conditions where the O/S could fail (a) A hardware failure, or (b)
A change made to the O/S environment by a user. In case (a), PowerHA will recognize
the hardware failure and initiate a failover. PowerHA, however, will not support case (b).
PowerHA requires diligent administration and monitoring. PowerHA cannot be installed
and left alone to run by itself. Taking this approach will certainly result in eventual
failure of the correct operation of PowerHA.
All of the available PowerHA documentation makes two major recommendations:
1. Whenever a change is made to the cluster that is being managed by PowerHA, no
matter how trivial it might seem, PowerHA must always be re-tested to insure that
nothing was modified in such a way that PowerHA can not longer function properly.
2. Regardless of whether the system was modified or not, a manual PowerHA failover
should be conducted at regular intervals, (for example, every three months).
Item 2 provides two benefits: It gives confirmation that a PowerHA failover will work
when an unexpected failure occurs. By executing a planned failover, any problems can
quickly be identified and resolved.
PowerHA depends greatly on the environment that it is assigned to manage. Due to its
flexibility, there are many ways to mis-configure a PowerHA environment. There is only
© Copyright IBM Corporation, 2012
one way to be certain that PowerHA has been configured to run successfully: Test, test
and re-test.
3. PowerHA and SPOF (Single Point of Failure) In order for PowerHA to work, it must not be limited by Single Points Of Failure or
SPOFs. For example, in order for PowerHA to maintain inter-nodal communication
within the HA cluster there must exist more than a single communication path. This
requires the availability of completely redundant switches, cables and adapters from one
end to the other. Having 8 communication adapters on each node does no good if the two
nodes are connected via a single data path (Ethernet cable). Having multiple redundant
zones on a switch won’t help if the switch loses power.
Therefore, building in redundancy is a must. This requires that half of the equipment may
be sitting idle, until a failure occurs, which unfortunately, is a cost of maintaining a High
Availability environment.
4. PowerHA and ECVG
Customers who are using Epic are required to provide a fail-over system which will take
over in the event of a primary OLTP system failure. This is of obvious necessity in a
health care related environment. IBM offers this facility on POWER based systems
through the use of PowerHA.
Should the active compute system which is running Epic encounter a failure, PowerHA
will recognize the loss of the active system. The fail-over process causes the resources,
(primarily the attached storage system), being used by the primary system to be acquired
by the take-over system. The backup system will then attempt to start the same Epic
environment. Although the takeover is not instantaneous, it does, however provide an
automated method to recover from a catastrophic hardware failure.
In more recent versions of PowerHA, IBM has introduced the use of Enhanced
Concurrent Volume Groups (ECVG). The advantage of ECVG is primarily that the Epic
database volumes are already varied on to both PowerHA nodes (active and standby
nodes). In the event of a failure, the time required for the take-over node to acquire the
Epic volumes is greatly reduced. Therefore IBM has encouraged their PowerHA
customers to take advantage of ECVG mounted volumes which are associated with a
PowerHA cluster.
In the unlikely event that PowerHA itself fails, ECVG can potentially cause a ‘split brain’
event.. When both nodes in the cluster can no longer communicate, or, especially if the
takeover node believes that the primary node has failed, it is possible for both nodes to
become active. Therefore, it is possible that the Epic software could start running on the
takeover node while the primary node is still in play. Recent versions of PowerHA
(versions 6.1 and 7.1) have significantly reduced the possibility of a ‘split brain’ event
occurring. In PowerHA version 6.1, ECVG can safely be used in the Epic environment.
© Copyright IBM Corporation, 2012
In PowerHA version 7.1, ECVG is mandatory anyway. Therefore, since PowerHA
ECVG can safely be used in an Epic environment.
When logical volumes are mounted concurrently, it allows access from more than one
compute node simultaneously. Therefore, when a volume group is mounted concurrently,
data on the volumes can be updated by both nodes.
5. Micro Partitioning
Micro Partitioning or SPLPAR is currently not supported within an Epic production
environment. DLPAR, however, is supported.
There are several reasons why Epic does not support the use of SPLPAR.
(a) Epic expects no more than a 15ms response latency from the Caché based DB server.
If both CPU and memory resources were to be shared between Epic and other
applications, there is always a possibility that a non-Epic application could choke
resources away from Epic during a critical time.
(b) When Epic provides the sizing information, the assumption is that the Epic products
are the only ones actively running on the system. Therefore, at a minimum, the Epic
partition would need to be fully configured with the Epic required resources. Epic
provides discounts to their customers if the customer has followed Epic’s
recommendation regarding configuration. It is assumed that those resources are available
at all times. Thus, in effect, the Epic LPAR would really be regarded as a fixed resource
LPAR, or DLPAR.
Epic sizes the DB server so that the customer is not running above 70% CPU utilization
under normal load. We don’t know how quickly a shared partition can obtain resources
from another shared partition, before those resources can actually begin to provide some
relief during a sudden and unplanned increase in resource demand originating from the
Epic partition. In any case, the priority for “spare capacity” to the Epic partition would
require top priority over all other partitions; thereby, once again, making the Epic
partition, an effectively independent DLPAR.
(c) Epic provides their customers a “guarantee of performance”. This is available to the
customer on condition that the customer has followed the Epic recommended guidelines.
Should a performance related problem occur, Epic will want to be able to reproduce the
problem. If performance was degraded due to shared resources being unavailable, it
would be more difficult for Epic (or IBM), to identify whether the cause was due to
something that happened within the Epic partition, or whether an external load-driven
event was the cause.
(d) At this time, we have not adequately tested the interaction between SPLPAR and
PowerHA. As an example, what would happen, or, what would we expect to have
© Copyright IBM Corporation, 2012
happen, were the system to experience a physical CPU failure. What should PowerHA do
if Epic happened to be using one tenth or more of the physical CPU at the time?
Normally, loss of a resource would trigger a fail-over. However, this CPU is now a
“virtual” resource.
Epic, however, has no objection to the use of SPLPAR in a non-production environment,
so long as performance is not being evaluated within that environment.
6. VIRTUAL I/O
Virtual I/O (VIO) may be used in the Epic environment. Although Virtual I/O may
provide better use of existing hardware resources, the performance impacts must be
considered in the production environment. The number of the adapters that are being
included in a Virtual I/O environment must continuously provide the same level of
performance as in a non-VIO environment.
NPIV virtualizes a physical fibre-channel adapter, thereby allowing the assignment of
multiple WWNs (World Wide Name IDs). Again, the total load of multiple LPARs being
supported by a physical adapter must be considered.
Epic prefers the use of physical adapters over VIO servers for the production OLTP
system. If VIO servers are desired for enterprise virtualization/consolidation practices,
the following considerations apply when using VIO with the production OLTP LPAR
and its failover LPAR.
a. Please follow IBM’s best practices to set up sufficient redundancy at the
VIO layer to avoid single points of failure.
b. Please follow IBM’s recommendation to properly size the VIO servers for
the overall activities on the server frame.
i. When using Oracle on an IBM Power Systems server as the Clarity
RDBMS: The Clarity RDBMS Oracle server should be on separate VIO servers
from the production OLTP LPAR and its failover LPAR.
ii. You should employ redundant VIO servers. Each VIO server must have
sufficient CPU and memory resources to support the full load expected. If they
are in a shared processor pool, the VIO servers should have the highest weight
within the pool to avoid being starved by activities from other application LPARs.
iii. Each VIO server must have a total of at least 4 ports from at least 2
physical HBAs. The total IO bandwidth provided by the HBAs must
accommodate the total IOPS projection from all LPARs, with sufficient
redundancy. The IOPS projections from the main Epic components can be found
in the previous IO projection and requirements section.
iv. The total network bandwidth provided by the Ethernet adapters must
accommodate the network traffic expected from all LPARs, with sufficient
redundancy. 10 Gbit interfaces are generally more appropriate for large scale
systems. If using 1 Gbit interfaces, multiple interfaces may have to be aggregated
to provide adequate bandwidth and acceptable latency. The Ethernet network
must provide sufficient amount of bandwidth for all of the Epic functional
© Copyright IBM Corporation, 2012
requirements (eg, Shadow, Backup, etc). You may still find it beneficial to use
separate NICs for traffic that may have unbounded bandwidth usage patterns.
a. There are two technologies available to provide IO access via VIO: virtual
SCSI and NPIV. Please discuss which technology best suits your needs with
your IBM support.
i. Be aware that queue_depth needs to be properly tuned at both VIO
server layer and the production LPAR layer when using virtual SCSI.
ii. Epic has conducted performance tests with NPIV and found the results
acceptable.
There could be different VIO considerations for SAN boot. If you desire to use SAN
boot, please follow IBM’s best practices for SAN boot.
7. Live Partition Mobility
Live Partition Mobility (LPM) provides the ability move an existing running Epic
instance from one Power Systems frame to another. During a migration, impact on
performance may be observed depending on the size of the Epic environment being
migrated. The database activity may be momentarily suspended. This may result in end
user clients being disconnected temporarily. The alternative for migrating an Epic
production instance from one Power Systems frame to another is to initiate a manual
PowerHA failover. Using PowerHA would result in anywhere from at least a 5 to 15
minute outage, versus a brief end-user client disconnect of less than a minute when using
Live Partition Mobility.
Live Partition Mobility requires VIO Servers on both the source and target Power
Systems frames. Use of NPIV is strongly recommended to support Live Partition
Mobility.
An LPM migration must be done only during low-use hours -- whenever there is minimal
use of the Epic production database.
What Information Should Be Collected When A Problem Occurs
The Epic environment is complex given that there are many “moving parts”. A
performance issue can be caused by any part of either the server system or of the storage
system. Because each stage of the computational process depends on all others, it can
often be difficult to identify the true culprit which causing a problem. For example,
although it seems that obtaining data from storage appears slow, it may in fact be the case
that the server is running out of I/O buffers or disk queues in order to handle the
incoming data from the storage system. Therefore each stage of the process must be
analyzed and diagnosed. The primary task is to determine whether a stage in the process
is waiting for something, (starving), or whether the stage is overloaded.
The disk I/O throughput may seem reasonable for the given configuration. However,
users are noting a substandard response time. Upon further investigation, it is determined
© Copyright IBM Corporation, 2012
that the Logical Volume Manager (LVM) has run out of resources on the server. This
may not be immediately evident since we don’t see large amounts of CPU being
consumed. However, lack of certain JFS buffers could result in a bottleneck.
Following is a partial list of information which should be collected when reporting a
problem, either to IBM support or to anyone involved in technical support of Epic.
(1) Have they filed a PMR with IBM? If so, provide the PMR number.
(2) Has Epic Systems been made aware of the problem? Who is the primary Epic contact
that they are dealing with?
(3) Type of System P server, Model, # of CPUs, total memory, DLPARS, SPLPARS, etc.
(4) Type of Storage. Number of spindles, Storage configuration, (eg RAID 5, RAID 10,
Stripe size, number of ranks, LUNs etc.).
(5) Are they using SVC?
(6) Is the storage or SVC being shared with other non-Epic applications?
(7) What has been changed prior to them experiencing the performance problem? For
example, increased users, change in storage config, additional workloads etc.
(8) Did the performance degrade suddenly, or was it a slow degradation over time.
(9) Is there a particular hour of day or night that the performance degrades? Is it constant?
(10) Can the customer provide results from the Epic RanRead facility?
(11) Does the performance degradation occur during a flash copy or other back-end copy
procedures?
Also, if one is available, provide a topology diagram showing the OLTP, Shadow, failover servers, the storage switches and associated interconnects to each component which supports the entire Epic environment.