Dell High-Performance Computing Clusters and Reservoir ......Grids: 77 x 256 x 10 (197,120 cells) Unknowns : 1.57 million Anisotropic, Layered Permeability with Kv/Kh = 0.1 88 wells,

Reza Rooholamini, Ph.D.Director

Enterprise SolutionsDell Computer Corp.

[email protected]

http://www.dell.com/clustering

Dell High-Performance Computing Clusters and

Reservoir Simulation Research at UT Austin

Enterprise Solutions2

Cos

t/C

ompl

exit

yProduct Maturity Life Cycle in the Open Systems Market

Proprietary StandardizationFully

Standardized

4P servers1/2P serversAppliance Servers

Network Attached Storage

Project based SANs

Heterogeneous SANs

Simplicity/Volume/Choice

Direct Attached Storage

RISC systems

8P servers

WorkstationDesktops

HPC Clusters

Grids


Our Vision

• Customers define our success: Begin with the customer. End with the customer

• Provide the best price/performance solutions to our customers inHPC

• Promote standardization to provide choice, lower cost of ownership, and simplicity in HPC solutions

• Evangelize new HPC technologies and selectively adopt the relevant ones for “productization”

• Derive the requirements for products by focusing on applications

• Provide a total solution: Hardware, software and services

• Partner with “best of class” in HPC


Infiniband

Dell PowerEdge Servers (IA32 & IA64)

Parallel Benchmarks (NAS, HINT, Linpack…) Parallel Benchmarks (NAS, HINT, Linpack…) and Parallel Applicationsand Parallel Applications

VIA

Fast Ethernet

TCP

PlatformPlatformPlatform

InterconnectInterconnectInterconnect

ProtocolProtocolProtocol

OSOSOS

MiddlewareMiddlewareMiddleware

BenchmarkBenchmarkBenchmark

Gigabit Ethernet Myrinet

GM

Linux Windows

MPI/Pro PVMMPICH MVICH

Quadrics

Elan

Building Block Approach


Dell and UT Austin

• Dell is sponsoring research in reservoir simulation at the Department of Petroleum and Geosystems Engineering

• Dr. Kamy Sepehrnoori is collaborating with Dell’s HPCC team on performance studies, paper publications, and parallel simulator development

• Dell HPCC team includes graduates from Dr. Sepehrnoori’s group specialized in Petroleum Engineering

• Dell has participated in Reservoir Simulation JIP (Joint Industry Project) in the past, and is planning to attend the upcoming meeting

• Dr. Sepehrnoori has access to Dell HPC lab for running large simulations, and is provided with hardware for development, testing, and performance studies of his program

CPGE

A Performance Study of Parallel Reservoir Simulation on

HPC Clusters

Baris Guler

Tau Leng

Victor Mashayekhi

Reza Rooholamini

Dell Computer CorporationDell Computer Corporation

Kamy Sepehrnoori

Center for Petroleum and Geosystems EngineeringCenter for Petroleum and Geosystems Engineering

The University of Texas at AustinThe University of Texas at Austin

CPGE

Outline

Background

Software/Hardware Description

Compositional Reservoir simulation on HPCs

Results

Summary

Future Work

CPGE

Reservoir Simulation Application

Reservoir Forecasting

Reservoir Performance optimization

Sensitivity Analysis

History Matching

Risk Assessment through Stochastic Simulation

Assessment of Uncertainity in Forecasting

Value of Information Studies

Reservoir Management

CPGE

Reservoir Simulation Steps

Data Input/Model Initialization

Do Time Step Computation

Solution of Non-Linear Partial Differential Equation

• Discretization

• Linearization and Newtonian Iteration

Solution using Direct or Iterative Solvers

Test for Convergence of Solution

• Data Output/Graphics

• Time-Step Increment

End of Simulation Study

Results Processing/Interpretation

CPGE

Reservoir Simulation Hardware

Mainframes

Supercomputers

RISC Workstations

PCs/Workstations

HPCs

1960 2000

MPPs

CPGE

Benefits of Parallel Processing

Turn-around time

Large-scale simulations

Cost

CPGE

Massively Parallel Computers

High Performance Computing Clusters

Parallel Processing

CPGE

Benefits of Clusters

Scalability

High Performance Computing

Low Cost

Availability

CPGE

Computational Mode

Distributed processing

Parallel processing

CPGE

Database Post Processing

Distributed Processing

Batch Queuing System to Simulation Program

D1 D2 D3 Dn

n >> m

P1 P2 P3 Pm

User

Input Generator

CPGE

Cluster Simulation System

InputInputDataData

FS 1FS 1

. . .

. . .

FS 2FS 2

FS 3FS 3

FS mFS mClu

ster

Sch

edul

erC

lust

er S

ched

ulerDS 1DS 1

. . .

. . .

DS 2DS 2

DS 3DS 3

DS nDS n

Dat

a G

ener

ator

Dat

a G

ener

ator

Project AdvisorProject Advisor

UserUserInputInput

Arc

hive

rA

rchi

ver

OutputOutput

Post

Post

-- Pro

cess

orPr

oces

sor

CPGE

CPGE

Parallel Processing

CPU-1CPU-1

RESERVOIR

FD

FD & DDCPU-1CPU-1 CPU-2CPU-2 CPU-3CPU-3

CPU-4CPU-4 CPU-5CPU-5 CPU-6CPU-6

RESERVOIRRESERVOIR

CPGE

Domain Decomposition

Fundamental strategy for grid-based parallel simulation.

Example:10 x 15 grid 6 processors

Ghost Layers CommunicationGhost Layers CommunicationGhost Layers CreationGhost Layers Creation

CPGE

Performance Issues in Parallel Processing

Software DesignAlgorithmParallelizationProgramming practiceLoad Balancing

CPGE

Performance Issues in Parallel Processing

Hardware Configuration

CPU

Cache

Memory subsystem

Front Side Bus

I/O bandwidth

Interconnect

CPGE

Hardware - Interconnect

4385Dolphin

4.5330Quadrics

6-8500Infiniband 4x

6-7225Myrinet

7.5110Giganet

17080Gigabit Ethernet

1709.0Fast Ethernet

Latency(ms)Speed(MBps)Type

CPGE

CPGE-1(Ararat)

12 Nodes / 16 Processors1.0 GHz Intel Pentium III Xeon processors256 MB of memory Diskless configuration100 Mbps switched Fast Ethernet and Giganet interconnects

CPGE

TACC-1(Tejas)• 32 Nodes / 64 Processors• 1.0 GHz Intel Pentium III processors• 1 GB of memory/processor• 225 MBps Myrinet-2000 interconnect

CPGE

Parallel Reservoir SimulatorsChevron-Texaco

Conoco-Phillips

Exxon-Mobil

IFP and Beicip-Franlab

Landmark Graphics Corporation

Schlumberger-Geoquest

Saudi Aramco

UT CPGE, UT CSM

Note : 93 clusters in Top500 supercomputer sites, 23 in Oil and Gas sector.

CPGE

Compositional Reservoir Simulation on HPCs

CPGE

Project Objectives

Develop a general purpose adaptive simulator (GPAS) capable of:

modeling of complex physical processes including EOS compositional, chemical, black-oil and thermal

high resolution studies on supercomputers and high-performance cluster

CPGE

HPC Initiatives

Evaluate and compare performance of different

cluster systems

Test and analyze performance of different parallel

simulators

Identify areas of improvement in parallel algorithm

design and cluster setup for optimal parallel

reservoir simulation

CPGE

Summary of Clusters

IBM SP Switch22GB4x16=64

1300Power4

TACC-2 (Longhorn)

Myrinet512MB32x2=64

1000Pentium IIITACC-1 (Tejas)

Myrinet, Gigabit,

Fast Ethernet1GB64x2=1282400Intel Xeon DP

DELL-2 (PE 2650)

Myrinet, Gigabit,

Fast Ethernet512MB

16x2=321000

Pentium IIIDELL-1 (PE 1550)

Fast Ethernet256MB

8x1+4x2=161000

Pentium III XeonCPGE-1 (Ararat)

Fast Ethernet256MB

8x2=16400

Pentium II XeonCPGE-1 (Rocky)

Fast Ethernet384MB

16x1=16300

Pentium IICPGE-1 (Fuji)

InterconnectMemory per CPUCPUsCPU Speed

(MHz)CPU TypeCluster

CPGE

Parallel Simulators Tested

GPASGPAS

VIP (2003r4)VIP (2003r4)

CPGE

CPGE Simulator (GPAS)

EOS Compositional

Peng-Robinson EOS

Fully Implicit

PETSc Linear Solvers

Parallel (IPARS Framework)

CPGE

Performance Results

CPGE

Base Benchmark Problem

Compositional model3-component Peng-Robinson EOS Dry-gas cycling processReservoir size: 800 x 11200 x 160 ft, homogeneous2 wells, 1 Injector, 1 producerGrids: 16 x 224 x 8 (28,672 cells)Unknowns : 229,376100 days of gas injectionOne dimensional domain decomposition

CPGE

SingleSingle--Processors’ Execution Processors’ Execution Times(GPASTimes(GPAS))Base Benchmark Problem

180.186

306.38

309.3

313.3

615.2

1030.3

0 200 400 600 800 1000 1200

Dell-PE2650 withIntel Xeon DP

2.4GHz

TACC-Tejas withPentium III

1.0Ghz

Ararat withPentium III Xeon

1.0GHz

PowerEdge 1550with Pentium III

1.0GHz

Rocky withPentium II Xeon

400MHz

Fuji with PentiumII 300MHz

Execution Time [sec]

CPGE

MultiMulti--Processors’ Execution Processors’ Execution Times(GPASTimes(GPAS))Base Benchmark Problem

1

10

100

1000

10000

0 4 8 12 16

Number of Processors

Exec

utio

n Ti

me

(sec

onds

)FujiRockyAraratPE 1550PE 2650TejasLonghorn

CPGE

MultiMulti--Processors’ Processors’ Speedups(GPASSpeedups(GPAS))Base Benchmark Problem

0

4

8

12

16

20

24

28

32

0 4 8 12 16 20 24 28 32


Sppe

dup

Fuji(FE)Rocky(FE)Ararat(FE)PE 1550(FE)PE 2650(FE)Tejas(My)Longhorn(*)Ideal

CPGE

Comparison of MPIComparison of MPI--Interconnects (GPAS)Interconnects (GPAS)Base Benchmark Problem

DELL PE 2650 (Single processor/node)

0

4

8

12

16

20

24

28

32

0 4 8 12 16 20 24 28 32


Spee

dup

MPICH-GIGABIT MPICH-GM - MYRINET MPI/PRO-GIGABITMPICH-FE Ideal

CPGE

Constant Problem Size per Constant Problem Size per Processor(GPASProcessor(GPAS))

0

200

400

600

800

19200,1CPU 38400, 2CPUs 76800, 4CPUs 153600, 8CPUs 307200,16CPUs

614400,32CPUs

Grid Dimensions, Number of CPUs

Exec

utio

n Ti

me

[sec

]

Fuji Rocky Ararat Tejas

CPGE

Modified Benchmark Problem

Compositional model3-component Peng-Robinson EOS Dry-gas cycling processReservoir size: 7.3 x 24.2 x .1 miles Grids: 77 x 256 x 10 (197,120 cells)Unknowns : 1.57 millionAnisotropic, Layered Permeability with Kv/Kh = 0.1 88 wells, 54 Injectors , 24 producers, staggered line driveInjectors and Producers are completed fully100 days of gas injectionOne dimensional domain decomposition

CPGE

MultiMulti--Processors’ Execution Processors’ Execution Times(GPASTimes(GPAS))Modified Benchmark ProblemModified Benchmark Problem

DELL PE 2650

10

100

1000

10000

0 8 16 24 32 40 48 56 64


Exe

cutio

n Ti

me

(Sec

onds

)

GBit-SINGLE My-SINGLE FE-SINGLE My-DUAL

CPGE

MultiMulti--Processors’ Processors’ Speedups(GPASSpeedups(GPAS))Modified Benchmark Problem

DELL PE 2650

0

8

16

24

32

40

48

56

64

72

0 8 16 24 32 40 48 56 64Number of Processors

Spee

dup

GIGABIT-SINGLE MYRINET-SINGLE FAST ETH-SINGLEMYRINET-DUAL Ideal

CPGE

Commercial Parallel Simulator

CPGE

REMARKS

Our goal was to run the simulators in parallel mode and evaluate their performance for typical casesOur goal was to analyze the different issues involved in using the simulators in parallel and approaches to improved performance and designWe did not

Tune simulators for optimum performanceCompare or match material balance errors of the simulator runs

CPGE

Benchmark Problem for VIP

Compositional model – Modified SPE3 comparison project9-component Peng Robinson EOS Gas condensate with gas cycling processReservoir size: 10 miles x 4 miles x 160ftGrids: 180 x 72 x 4 (51,840 cells)1 million unknownsFlow barriers present (using Transmissibility modifiers)20 wells, 10 Injectors , 10 producers10 years of cycling followed by 5 years of production

CPGE

Multi-Processors’ PerformanceVIP

CPGE

MultiMulti--Processors’ Execution Processors’ Execution Times(VIPTimes(VIP))MODIFIED SPE3 COMPARISON PROBLEM

02000400060008000

1000012000

0 4 8 12 16


Elap

sed

Tim

e (s

ec)

Fuji Rocky

CPGE

MultiMulti--Processors’ Processors’ Speedups(VIPSpeedups(VIP))MODIFIED SPE3 COMPARISON PROBLEM

0

4

8

12

16

0 4 8 12 16Processors

Spee

dup

Fuji Rocky Ideal

CPGE

Constant Problem Size per Constant Problem Size per Processor(VIPProcessor(VIP))MODIFIED SPE3 COMPARISON PROBLEM

0100020003000400050006000700080009000

10000

25920, 1CPU 51480, 2CPUs 103680, 4CPUs 207360, 8CPUs 414720,16CPUsNumber of Cells, Number of CPUs

Exec

utio

n Ti

me

[sec

]

Fuji Rocky

CPGE

Million Cell Commercial Benchmark Problem for VIP

IMPES scheme7-component Peng Robinson EOS Grid: 100 x 100 x 100 (1 Million cells)16 million unknownsStochastically characterized data field11 wells49 Year run

CPGE

Performance Speedups - VIPMILLION GRIDBLOCK PROBLEM

DELL PE 2650

0

8

16

24

32

40

48

56

64

0 8 16 24 32 40 48 56 64Number of Processors

Spee

dup

VIP(*)

Ideal

CPGE

Summary

Tested GPAS and analyzed performance on new hardwareBenchmarked performance of new clustersCompared performance of different interconnects and MPI librariesTested commercial reservoir simulator VIP in parallel mode

CPGE

Acknowledgements

US Department of Energy

Reservoir Simulation Joint Industry Project Members

Dell Computer Corporation

Documents

Dell High-Performance Computing Clusters and Reservoir ......Grids: 77 x 256 x 10 (197,120 cells) Unknowns : 1.57 million Anisotropic, Layered Permeability with Kv/Kh = 0.1 88 wells,