Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Reza Rooholamini, Ph.D.Director
Enterprise SolutionsDell Computer Corp.
http://www.dell.com/clustering
Dell High-Performance Computing Clusters and
Reservoir Simulation Research at UT Austin
Enterprise Solutions2
Cos
t/C
ompl
exit
yProduct Maturity Life Cycle in the Open Systems Market
Proprietary StandardizationFully
Standardized
4P servers1/2P serversAppliance Servers
Network Attached Storage
Project based SANs
Heterogeneous SANs
Simplicity/Volume/Choice
Direct Attached Storage
RISC systems
8P servers
WorkstationDesktops
HPC Clusters
Grids
Enterprise Solutions3
Our Vision
• Customers define our success: Begin with the customer. End with the customer
• Provide the best price/performance solutions to our customers inHPC
• Promote standardization to provide choice, lower cost of ownership, and simplicity in HPC solutions
• Evangelize new HPC technologies and selectively adopt the relevant ones for “productization”
• Derive the requirements for products by focusing on applications
• Provide a total solution: Hardware, software and services
• Partner with “best of class” in HPC
Enterprise Solutions4
Infiniband
Dell PowerEdge Servers (IA32 & IA64)
Parallel Benchmarks (NAS, HINT, Linpack…) Parallel Benchmarks (NAS, HINT, Linpack…) and Parallel Applicationsand Parallel Applications
VIA
Fast Ethernet
TCP
PlatformPlatformPlatform
InterconnectInterconnectInterconnect
ProtocolProtocolProtocol
OSOSOS
MiddlewareMiddlewareMiddleware
BenchmarkBenchmarkBenchmark
Gigabit Ethernet Myrinet
GM
Linux Windows
MPI/Pro PVMMPICH MVICH
Quadrics
Elan
Building Block Approach
Enterprise Solutions5
Dell and UT Austin
• Dell is sponsoring research in reservoir simulation at the Department of Petroleum and Geosystems Engineering
• Dr. Kamy Sepehrnoori is collaborating with Dell’s HPCC team on performance studies, paper publications, and parallel simulator development
• Dell HPCC team includes graduates from Dr. Sepehrnoori’s group specialized in Petroleum Engineering
• Dell has participated in Reservoir Simulation JIP (Joint Industry Project) in the past, and is planning to attend the upcoming meeting
• Dr. Sepehrnoori has access to Dell HPC lab for running large simulations, and is provided with hardware for development, testing, and performance studies of his program
CPGE
A Performance Study of Parallel Reservoir Simulation on
HPC Clusters
Baris Guler
Tau Leng
Victor Mashayekhi
Reza Rooholamini
Dell Computer CorporationDell Computer Corporation
Kamy Sepehrnoori
Center for Petroleum and Geosystems EngineeringCenter for Petroleum and Geosystems Engineering
The University of Texas at AustinThe University of Texas at Austin
CPGE
Outline
Background
Software/Hardware Description
Compositional Reservoir simulation on HPCs
Results
Summary
Future Work
CPGE
Reservoir Simulation Application
Reservoir Forecasting
Reservoir Performance optimization
Sensitivity Analysis
History Matching
Risk Assessment through Stochastic Simulation
Assessment of Uncertainity in Forecasting
Value of Information Studies
Reservoir Management
CPGE
Reservoir Simulation Steps
Data Input/Model Initialization
Do Time Step Computation
Solution of Non-Linear Partial Differential Equation
• Discretization
• Linearization and Newtonian Iteration
Solution using Direct or Iterative Solvers
Test for Convergence of Solution
• Data Output/Graphics
• Time-Step Increment
End of Simulation Study
Results Processing/Interpretation
CPGE
Reservoir Simulation Hardware
Mainframes
Supercomputers
RISC Workstations
PCs/Workstations
HPCs
1960 2000
MPPs
CPGE
Benefits of Parallel Processing
Turn-around time
Large-scale simulations
Cost
CPGE
Massively Parallel Computers
High Performance Computing Clusters
Parallel Processing
CPGE
Benefits of Clusters
Scalability
High Performance Computing
Low Cost
Availability
CPGE
Computational Mode
Distributed processing
Parallel processing
CPGE
Database Post Processing
Distributed Processing
Batch Queuing System to Simulation Program
D1 D2 D3 Dn
n >> m
P1 P2 P3 Pm
User
Input Generator
CPGE
Cluster Simulation System
InputInputDataData
FS 1FS 1
. . .
. . .
FS 2FS 2
FS 3FS 3
FS mFS mClu
ster
Sch
edul
erC
lust
er S
ched
ulerDS 1DS 1
. . .
. . .
DS 2DS 2
DS 3DS 3
DS nDS n
Dat
a G
ener
ator
Dat
a G
ener
ator
Project AdvisorProject Advisor
UserUserInputInput
Arc
hive
rA
rchi
ver
OutputOutput
Post
Post
-- Pro
cess
orPr
oces
sor
CPGE
CPGE
Parallel Processing
CPU-1CPU-1
RESERVOIR
FD
FD & DDCPU-1CPU-1 CPU-2CPU-2 CPU-3CPU-3
CPU-4CPU-4 CPU-5CPU-5 CPU-6CPU-6
RESERVOIRRESERVOIR
CPGE
Domain Decomposition
Fundamental strategy for grid-based parallel simulation.
Example:10 x 15 grid 6 processors
Ghost Layers CommunicationGhost Layers CommunicationGhost Layers CreationGhost Layers Creation
CPGE
Performance Issues in Parallel Processing
Software DesignAlgorithmParallelizationProgramming practiceLoad Balancing
CPGE
Performance Issues in Parallel Processing
Hardware Configuration
CPU
Cache
Memory subsystem
Front Side Bus
I/O bandwidth
Interconnect
CPGE
Hardware - Interconnect
4385Dolphin
4.5330Quadrics
6-8500Infiniband 4x
6-7225Myrinet
7.5110Giganet
17080Gigabit Ethernet
1709.0Fast Ethernet
Latency(ms)Speed(MBps)Type
CPGE
CPGE-1(Ararat)
12 Nodes / 16 Processors1.0 GHz Intel Pentium III Xeon processors256 MB of memory Diskless configuration100 Mbps switched Fast Ethernet and Giganet interconnects
CPGE
TACC-1(Tejas)• 32 Nodes / 64 Processors• 1.0 GHz Intel Pentium III processors• 1 GB of memory/processor• 225 MBps Myrinet-2000 interconnect
CPGE
Parallel Reservoir SimulatorsChevron-Texaco
Conoco-Phillips
Exxon-Mobil
IFP and Beicip-Franlab
Landmark Graphics Corporation
Schlumberger-Geoquest
Saudi Aramco
UT CPGE, UT CSM
Note : 93 clusters in Top500 supercomputer sites, 23 in Oil and Gas sector.
CPGE
Compositional Reservoir Simulation on HPCs
CPGE
Project Objectives
Develop a general purpose adaptive simulator (GPAS) capable of:
modeling of complex physical processes including EOS compositional, chemical, black-oil and thermal
high resolution studies on supercomputers and high-performance cluster
CPGE
HPC Initiatives
Evaluate and compare performance of different
cluster systems
Test and analyze performance of different parallel
simulators
Identify areas of improvement in parallel algorithm
design and cluster setup for optimal parallel
reservoir simulation
CPGE
Summary of Clusters
IBM SP Switch22GB4x16=64
1300Power4
TACC-2 (Longhorn)
Myrinet512MB32x2=64
1000Pentium IIITACC-1 (Tejas)
Myrinet, Gigabit,
Fast Ethernet1GB64x2=1282400Intel Xeon DP
DELL-2 (PE 2650)
Myrinet, Gigabit,
Fast Ethernet512MB
16x2=321000
Pentium IIIDELL-1 (PE 1550)
Fast Ethernet256MB
8x1+4x2=161000
Pentium III XeonCPGE-1 (Ararat)
Fast Ethernet256MB
8x2=16400
Pentium II XeonCPGE-1 (Rocky)
Fast Ethernet384MB
16x1=16300
Pentium IICPGE-1 (Fuji)
InterconnectMemory per CPUCPUsCPU Speed
(MHz)CPU TypeCluster
CPGE
Parallel Simulators Tested
GPASGPAS
VIP (2003r4)VIP (2003r4)
CPGE
CPGE Simulator (GPAS)
EOS Compositional
Peng-Robinson EOS
Fully Implicit
PETSc Linear Solvers
Parallel (IPARS Framework)
CPGE
Performance Results
CPGE
Base Benchmark Problem
Compositional model3-component Peng-Robinson EOS Dry-gas cycling processReservoir size: 800 x 11200 x 160 ft, homogeneous2 wells, 1 Injector, 1 producerGrids: 16 x 224 x 8 (28,672 cells)Unknowns : 229,376100 days of gas injectionOne dimensional domain decomposition
CPGE
SingleSingle--Processors’ Execution Processors’ Execution Times(GPASTimes(GPAS))Base Benchmark Problem
180.186
306.38
309.3
313.3
615.2
1030.3
0 200 400 600 800 1000 1200
Dell-PE2650 withIntel Xeon DP
2.4GHz
TACC-Tejas withPentium III
1.0Ghz
Ararat withPentium III Xeon
1.0GHz
PowerEdge 1550with Pentium III
1.0GHz
Rocky withPentium II Xeon
400MHz
Fuji with PentiumII 300MHz
Execution Time [sec]
CPGE
MultiMulti--Processors’ Execution Processors’ Execution Times(GPASTimes(GPAS))Base Benchmark Problem
1
10
100
1000
10000
0 4 8 12 16
Number of Processors
Exec
utio
n Ti
me
(sec
onds
)FujiRockyAraratPE 1550PE 2650TejasLonghorn
CPGE
MultiMulti--Processors’ Processors’ Speedups(GPASSpeedups(GPAS))Base Benchmark Problem
0
4
8
12
16
20
24
28
32
0 4 8 12 16 20 24 28 32
Number of Processors
Sppe
dup
Fuji(FE)Rocky(FE)Ararat(FE)PE 1550(FE)PE 2650(FE)Tejas(My)Longhorn(*)Ideal
CPGE
Comparison of MPIComparison of MPI--Interconnects (GPAS)Interconnects (GPAS)Base Benchmark Problem
DELL PE 2650 (Single processor/node)
0
4
8
12
16
20
24
28
32
0 4 8 12 16 20 24 28 32
Number of Processors
Spee
dup
MPICH-GIGABIT MPICH-GM - MYRINET MPI/PRO-GIGABITMPICH-FE Ideal
CPGE
Constant Problem Size per Constant Problem Size per Processor(GPASProcessor(GPAS))
0
200
400
600
800
19200,1CPU 38400, 2CPUs 76800, 4CPUs 153600, 8CPUs 307200,16CPUs
614400,32CPUs
Grid Dimensions, Number of CPUs
Exec
utio
n Ti
me
[sec
]
Fuji Rocky Ararat Tejas
CPGE
Modified Benchmark Problem
Compositional model3-component Peng-Robinson EOS Dry-gas cycling processReservoir size: 7.3 x 24.2 x .1 miles Grids: 77 x 256 x 10 (197,120 cells)Unknowns : 1.57 millionAnisotropic, Layered Permeability with Kv/Kh = 0.1 88 wells, 54 Injectors , 24 producers, staggered line driveInjectors and Producers are completed fully100 days of gas injectionOne dimensional domain decomposition
CPGE
MultiMulti--Processors’ Execution Processors’ Execution Times(GPASTimes(GPAS))Modified Benchmark ProblemModified Benchmark Problem
DELL PE 2650
10
100
1000
10000
0 8 16 24 32 40 48 56 64
Number of Processors
Exe
cutio
n Ti
me
(Sec
onds
)
GBit-SINGLE My-SINGLE FE-SINGLE My-DUAL
CPGE
MultiMulti--Processors’ Processors’ Speedups(GPASSpeedups(GPAS))Modified Benchmark Problem
DELL PE 2650
0
8
16
24
32
40
48
56
64
72
0 8 16 24 32 40 48 56 64Number of Processors
Spee
dup
GIGABIT-SINGLE MYRINET-SINGLE FAST ETH-SINGLEMYRINET-DUAL Ideal
CPGE
Commercial Parallel Simulator
CPGE
REMARKS
Our goal was to run the simulators in parallel mode and evaluate their performance for typical casesOur goal was to analyze the different issues involved in using the simulators in parallel and approaches to improved performance and designWe did not
Tune simulators for optimum performanceCompare or match material balance errors of the simulator runs
CPGE
Benchmark Problem for VIP
Compositional model – Modified SPE3 comparison project9-component Peng Robinson EOS Gas condensate with gas cycling processReservoir size: 10 miles x 4 miles x 160ftGrids: 180 x 72 x 4 (51,840 cells)1 million unknownsFlow barriers present (using Transmissibility modifiers)20 wells, 10 Injectors , 10 producers10 years of cycling followed by 5 years of production
CPGE
Multi-Processors’ PerformanceVIP
CPGE
MultiMulti--Processors’ Execution Processors’ Execution Times(VIPTimes(VIP))MODIFIED SPE3 COMPARISON PROBLEM
02000400060008000
1000012000
0 4 8 12 16
Number of Processors
Elap
sed
Tim
e (s
ec)
Fuji Rocky
CPGE
MultiMulti--Processors’ Processors’ Speedups(VIPSpeedups(VIP))MODIFIED SPE3 COMPARISON PROBLEM
0
4
8
12
16
0 4 8 12 16Processors
Spee
dup
Fuji Rocky Ideal
CPGE
Constant Problem Size per Constant Problem Size per Processor(VIPProcessor(VIP))MODIFIED SPE3 COMPARISON PROBLEM
0100020003000400050006000700080009000
10000
25920, 1CPU 51480, 2CPUs 103680, 4CPUs 207360, 8CPUs 414720,16CPUsNumber of Cells, Number of CPUs
Exec
utio
n Ti
me
[sec
]
Fuji Rocky
CPGE
Million Cell Commercial Benchmark Problem for VIP
IMPES scheme7-component Peng Robinson EOS Grid: 100 x 100 x 100 (1 Million cells)16 million unknownsStochastically characterized data field11 wells49 Year run
CPGE
Performance Speedups - VIPMILLION GRIDBLOCK PROBLEM
DELL PE 2650
0
8
16
24
32
40
48
56
64
0 8 16 24 32 40 48 56 64Number of Processors
Spee
dup
VIP(*)
Ideal
CPGE
Summary
Tested GPAS and analyzed performance on new hardwareBenchmarked performance of new clustersCompared performance of different interconnects and MPI librariesTested commercial reservoir simulator VIP in parallel mode
CPGE
Acknowledgements
US Department of Energy
Reservoir Simulation Joint Industry Project Members
Dell Computer Corporation