21
e Building the Best Computer Clusters for Mechanical Design Environments Daniel Chan and Ed Huott GE Corporate R&D Center e

Compute Cluster

Embed Size (px)

Citation preview

Page 1: Compute Cluster

e

Building the Best Computer Clusters for Mechanical Design Environments

Daniel Chan and Ed Huott

GE Corporate R&D Center e

Page 2: Compute Cluster

e

Key Challenges

•  A Highly Diverse Environment –  Many Different Businesses –  Engineering, Chemistry, Financial, Services

•  Short-Term Technical Support –  Quick Turnaround

•  Costs –  Downtime, System Admin., Software

Licenses, Productivity

Page 3: Compute Cluster

e

Design Requirements

• Web-Centric –  Simplify Job Submission in a Heterogeneous

Environment –  Access from Anywhere and Anytime

• Minimize Complexity and Cost •  Stable and Highly-Available

–  Isolate Network, NFS and Insufficient Disk Space Problems, Software License Management

Page 4: Compute Cluster

e

Reasons for Having a Heterogeneous Environment

•  Performance

•  Costs

•  Legacy Applications

Page 5: Compute Cluster

e

Let’s Do An Experiment

GE Corporate Standard PIII 450 MHz 256 MB RAM

$1400

Sun Ultra 10 UltraSPARC 440 MHz

256 MB RAM $5000

NT 4.0 Unix/Solaris 2.6

CFX 5 23,862 Nodes

102 MB Required

Use Perl script to launch 1, 2 and 3 concurrent jobs, respectively, for 30 times

Extract wall clock times from output files then use Minitab to analyze data

Test stability, multitasking and paging capabilities

Page 6: Compute Cluster

e

Results For Sun Ultra 10

4000 5000 6000

0

10

20

30

40

50

C4

Freq

uenc

y

Histogram of C4, with Normal Curve

1700 1710 1720 1730 1740 1750 1760

0

1

2

3

4

5

6

7

8

C4

Freq

uenc

y

Histogram of C4, with Normal Curve

3320 3340 3360 3380 3400 3420 3440 3460 3480 3500

0

10

20

30

C4

Freq

uenc

y

Histogram of C4, with Normal Curve1 job 2 jobs

3 jobs

Page 7: Compute Cluster

e

Results for Dell OptiPlex G1x

1924 1925 1926 1927 1928

0

10

20

TOTAL_CLOCK_TIME

Freq

uenc

y

Histogram of TOTAL_CLOCK_TIME, with Normal Curve

3900 3950 4000

0

10

20

30

40

C4Fr

eque

ncy

Histogram of C4, with Normal Curve

5800 5850 5900

0

10

20

30

40

50

TOTAL_CLOCK_TIME

Freq

uenc

y

Histogram of TOTAL_CLOCK_TIME, with Normal Curve

1 job 2 jobs

3 jobs

Page 8: Compute Cluster

e

Scorecard for Sun Ultra 10

Mean Standard Deviation

Z

1 job 1724 15.9 10.8 2 jobs 3442

(2X) 26.1 13.2

3 jobs 5144 (3X)

350.2 1.5

ZUSL

=−

σµσ01.

Paging

Page 9: Compute Cluster

e

Scorecard For Dell Optiplex GX1

Mean Standard Deviation

Z

1 job 1926 1.2 161 2 jobs 3909

(2.1X) 15.6 25

3 jobs 5828 (3X)

26.4 22

ZUSL

=−

σµσ01.

Page 10: Compute Cluster

e

Summary of Results

•  The SUN workstation is about 10% faster, but nearly 5 times as expensive

•  Both systems are just as stable for a period of one week

•  NT can multitask

•  NT can page and appears to have “better” memory management scheme

Page 11: Compute Cluster

e

APNASA Parallel Performance

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11

200

400

600

800

1000

1200

1400

PIII 550 MHz Xeon2 MB Cache, 8-Way SMP

SGI Origin 2000R10K 195 MHz

PII 333 MHz Cluster

TIM

E (S

EC)

No. of Processors

308,115 Grid Points

Page 12: Compute Cluster

e

APNASA Parallel Performance

0 1 2 3 4 5 6 7 8 9 10 110

1

2

3

4

5

6

7

8

9IDEAL

PII 333 MHz Cluster

PIII 550 MHz Xeon2 MB Cache, 8-Way SMP

SGI Origin 2000R10K 195 MHz

Spee

d U

p

No. of Processors

308,115 Grid Points

Page 13: Compute Cluster

e 0.80.911 2 3 4 5 6 7 8 91010 20

103

104

81118 VERTICESPII 333 MHz NT

81118 VERTICESWORKSTATIONS

23862 VERTICESWORKSTATIONS

23862 VERTICESSGI SMP

81118 VERTICESSGI SMP

556323 VERTICESSGI SMP

CP

U T

IME

(SE

CS

)

NUMBER OF PROCESSORS

CFX PERFORMANCE ON A VARIETY OF COMPUTER

PLATFORMS

Page 14: Compute Cluster

e

ANSYS Performance on a Variety of Computers

0 1 2 3 4 5 6 7 8 9 10 11 12

Sun Enterprise 450Ultra II 400 MHz

HP J5000, PA8500 400 MHz

Compaq SP700, PIII 500 MHz

SGI 320, PIII 500 MHz

Dell Optiplex GX1, PII 450 MHz

Dell 6300, PIII 500 MHz Xeon

Compaq DS20, 500 MHz EV6

SGI O2K R10K 250 MHz

CPU TIME (HRS)

ELAPSED TIME CPU TIME

160,000 ELEMENTS WITH 2D CONTACT 2.1 GB OF OUTPUT AND 615 MB PAGE FILE

Page 15: Compute Cluster

e

eComputing Architecture

Quality Tools

Remote Access from Anywhere Anytime

(Globalization)

Field and Customer Data (Services and eCommerce) W

eb-E

nabl

ed E

nviro

nmen

t

SGI SC

HP Cluster

NT Cluster

DM Cluster Load

Sha

ring

Faci

lity user-transparent work

load management

Page 16: Compute Cluster

e

Key LSF Challenges

•  Implicit Assumptions of A Homogeneous Environment –  Uniform View of NFS –  Uniform Mounting of User Home Directory –  Unix Bias

• Wholesale Migration of User Environment Variables from Submit to Execution Hosts

Page 17: Compute Cluster

e

A Web-Centric Job Management System

Web Clients Web Server

https NFS Job Spec. Files

Execution Hosts

Job Directories LSF Cluster

ftp NFS or SMB

LSF Proxy

Page 18: Compute Cluster

e

Screen Shot of Web Interface

Job Source Files

Job Output Destination

Command to Run on Execution Host

Files Sent to Output

Destination

Page 19: Compute Cluster

e

Process, Data Flow and Security for Web-Based Jobs

Web Client Apache (invoke_bsub.pl)

https Job Spec

Exec Host (lsf_exec.pl) Job Dir

LSF Cluster

ftp bsub

Authenticated access to setuid script File system

permissions

Page 20: Compute Cluster

e

Normalized Job Execution Wrapper/Environment

•  Standard job execution wrapper that runs on all execution hosts. •  Key "ingredients": Perl (with Net::FTP package), wget, subset of

"standard" Unix utilities (e.g. /bin/sh, cp, mv, etc.) –  (Note: Makes use of Cygwin tools on Windows NT. See http://

sources.redhat.com/cygwin/)

•  Reads Job Specification File to determine: –  Source of remote job directory –  Command to run o Destination for job output file(s)

•  Pre-determined top level (local) directory for jobs on each execution host.

•  Remote jobs are migrated by FTP (wget) to unique sub-directories under top level.

•  Specified command is run in migrated job directory. •  Output files are transferred by FTP to specified destination.

Page 21: Compute Cluster

e

Summary

•  Leveraging Low-Cost NT Solution •  Shield Users from Complexity

–  productivity gain –  better centralized resource management

•  Enterprise Resource Planning –  compute cycles –  software licenses