21
Yellowstone Capability and capacity for the atmospheric and related sciences David Hart [email protected] NCAR/CISL User Services SC13 — Denver, CO — Nov. 19-20, 2013

Yellowstone Capability and capacity for the atmospheric and related sciences

  • Upload
    renata

  • View
    16

  • Download
    0

Embed Size (px)

DESCRIPTION

Yellowstone Capability and capacity for the atmospheric and related sciences. David Hart [email protected] NCAR/CISL User Services SC13 — Denver, CO — Nov. 19-20, 2013. Outline. Yellowstone environment Metrics for management and planning Users and fields of science - PowerPoint PPT Presentation

Citation preview

Page 1: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

YellowstoneCapability and capacity

for the atmospheric and related sciences

David [email protected]

NCAR/CISL User ServicesSC13 — Denver, CO — Nov. 19-20, 2013

Page 2: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

2

Outline• Yellowstone environment• Metrics for management and planning

– Users and fields of science– Availability, utilization, and efficiency– Job mix: Capacity v. Capability– Wait times and user satisfaction– Disk and archive growth

Page 3: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

Yellowstone environment

Partner Sites XSEDE Sites

Data TransferServices

Science GatewaysRDA, ESG

High Bandwidth Low Latency HPC and I/O NetworksMellanox FDR InfiniBand fat tree and 10Gb Ethernet

Remote Vis

1Gb/10Gb Ethernet (40Gb+ future)

NCAR HPSS Archive100 PB capacity

~15 PB/yr growth

Geyser16 nodes

4 x 10-core Westmere

1 TBCaldera16 nodes

2 x 8-core SNB64 GB

2 NVIDIA GPUs

GLADECentral disk resource

11 PB, 90 GB/s bandwidth

YellowstoneHPC resource, 1.50 PFLOPS peak

4,536 nodes: 2 x 8-core Intel Sandy Bridge, 32 GB

Page 4: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

4

Yellowstone users

Sep-08

Oct-08

Nov-08

Dec-08

Jan-09

Feb-09

Mar-09

Apr-09

May-09

Jun-09Jul-0

9

Aug-09

Sep-090

500

1,000

1,500

2,000

2,500

3,0002,866

440337

1,066

162

Yellowstone open users Yellowstone active users HPSS R/W usersHPSS storage users Geyser/Caldera active users

Page 5: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

5

Time to start, duration of use

0 20 40 60 80100

120140

160180

200220

240260

280300

320340

360380

050

100150200250

Days between first job and last job to date

Use

rs

0 20 40 60 80100

120140

160180

200220

240260

280300

320340

3800

50100150200

Days from account set-up to first job

Use

rs

Page 6: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

6

Fields of science

Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-090%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Climate, Large-Scale Dynamics Fluid Dynamics and Turbulence Geospace Sciences

Ocean Sciences Weather Prediction Earth Sciences

Atmospheric Chemistry Mesoscale Meteorology All Others

Page 7: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

7

ASD v. Other use

Sep-08

Oct-08

Nov-08

Dec-08

Jan-09

Feb-09

Mar-09

Apr-09

May-09

Jun-09Jul-0

9

Aug-09

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

40,000,000

45,000,000

50,000,000

ASD Others

Core

-hou

rs u

sed

(mill

ions

)

Page 8: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

8

Availability, utilization & efficiency

AveragesAvailability: 97.0% ± 0.9% Utilization: 81.1% ± 1.6% User: 40.9% ± 1.1% FP Efficiency: 1.65% ± 0.1%

12/15/12

12/25/12

1/4/13

1/14/13

1/24/13

2/3/13

2/13/13

2/23/13

3/5/13

3/15/13

3/25/13

4/4/13

4/14/13

4/24/13

5/4/13

5/14/13

5/24/13

6/3/13

6/13/13

6/23/13

7/3/13

7/13/13

7/23/13

8/2/13

8/12/13

8/22/13

9/1/13

9/11/13

9/21/13

10/1/13

10/11/13

10/21/13

10/31/13

11/10/130%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% Availability % Utilization %User %FP Eff (vs. %Util)

Page 9: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

9

Daily utilization pattern

Page 10: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

10

Analysis cluster availability & utilization

12/20/1

2

12/30/1

2

1/9/1

3

1/19/1

3

1/29/1

3

2/8/1

3

2/18/1

3

2/28/1

3

3/10/1

3

3/20/1

3

3/30/1

3

4/9/1

3

4/19/1

3

4/29/1

3

5/9/1

3

5/19/1

3

5/29/1

3

6/8/1

3

6/18/1

3

6/28/1

3

7/8/1

3

7/18/1

3

7/28/1

3

8/7/1

3

8/17/1

3

8/27/1

3

9/6/1

3

9/16/1

3

9/26/1

3

10/6/1

3

10/16/1

3

10/26/1

3

11/5/1

30%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Geyser Availability Geyser Utilization Caldera Availability Caldera Utilization

Page 11: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

11

Yellowstone job mix

0200400600800

1,000

1,200

1,400

1,600

1,800

2,000

2,200

2,400

2,600

2,800

3,000

3,200

3,400

3,600

3,800

4,000

4,200

4,400

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000Core-hours used # of Jobs

Nodes used (nearest 100)

David L Hart. 2011. Measuring TeraGrid: workload characterization for a high-performance computing federation. Int. J. High Perform. Comput. Appl. 25, 4 (November 2011), 451-465. DOI=10.1177/1094342010394382

Page 12: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

12

1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,1920%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Projects Jobs Core-hours

Maximum node count for project's jobs

Capability versus capacity

CapabilityCapacity

David Hart. 2011. Deep and wide metrics for HPC resource capability and project usage. In State of the Practice Reports (SC '11). ACM, New York, NY, USA, Article 1, 7 pages. DOI=10.1145/2063348.2063350.

Page 13: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

13

Analysis cluster activity

Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-090

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

Usage Jobs

Page 14: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

14

Job wait times

0200400600800

1,000

1,200

1,400

1,600

1,800

2,000

2,200

2,400

2,600

2,800

3,000

3,200

3,400

3,600

3,800

4,000

4,200

4,400

02468

10

Job size in nodes (nearest 100)

Avg

wai

t (hr

s)

0 28 56 84 1121401681962242522803083363643924204484765045325601

10100

1,00010,000

100,0001,000,000

10,000,000

Wait hours

Jobs

Page 15: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

User satisfaction metrics

Job attribute YellowstoneAverages

XSEDEAverages

Unweighted

Run time (hrs) 0.6 2.0

Wait time (hrs) 0.6 4.0

Response time (hrs) 1.2, 6.8

Slow down 22.7 320.3

Weighted

Wtd run time (hrs) 6.7 19.6

Wtd wait time (hrs) 8.0 16.8

Wtd response time (hrs) 14.7 36.4

Wtd slow down 3.1 2.6

15

Ernemann, C., Hamscher, V., and Yahyapour, R. (2004). Benefits of Global Grid Computing for Job Scheduling. In Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04). Washington, DC: IEEE Computer Society, pp. 374-379.

Page 16: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

16

GLADE disk growth

11/2/0

8

11/19/0

8

12/6/0

8

12/23/0

8

1/9/0

9

1/26/0

9

2/12/0

9

3/1/0

9

3/18/0

9

4/4/0

9

4/21/0

9

5/8/0

9

5/25/0

9

6/11/0

9

6/28/0

9

7/15/0

9

8/1/0

9

8/18/0

9

9/4/0

9

9/21/0

9

10/8/0

9

10/25/0

90.0

500.0

1,000.0

1,500.0

2,000.0

2,500.0

3,000.0

3,500.0

4,000.0

4,500.0

5,000.0

5,500.0

/glade/p/work /glade/project /glade/scratch

PB

Page 17: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

17

GLADE weekly growth

Nov-08

Nov-08

Dec-08

Dec-08

Jan-09

Jan-09

Feb-09

Mar-09

Mar-09

Apr-09

Apr-09

May-09

May-09

Jun-09Jun-09

Jul-09Jul-0

9

Aug-09

Aug-09

Sep-09

Sep-09

Oct-09

-50

0

50

100

150

200

250

300

350

TB w

eekl

y in

crea

se/d

ecre

ase

Page 18: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

18

HPSS daily activity

10/14/0

8

10/28/0

8

11/11/0

8

11/25/0

8

12/9/0

8

12/23/0

8

1/6/0

9

1/20/0

9

2/3/0

9

2/17/0

9

3/3/0

9

3/17/0

9

3/31/0

9

4/14/0

9

4/28/0

9

5/12/0

9

5/26/0

9

6/9/0

9

6/23/0

9

7/7/0

9

7/21/0

9

8/4/0

9

8/18/0

9

9/1/0

9

9/15/0

9

9/29/0

9

10/13/0

90

10000200003000040000500006000070000

Terabytes written daily

10/14/0

8

10/28/0

8

11/11/0

8

11/25/0

8

12/9/0

8

12/23/0

8

1/6/0

9

1/20/0

9

2/3/0

9

2/17/0

9

3/3/0

9

3/17/0

9

3/31/0

9

4/14/0

9

4/28/0

9

5/12/0

9

5/26/0

9

6/9/0

9

6/23/0

9

7/7/0

9

7/21/0

9

8/4/0

9

8/18/0

9

9/1/0

9

9/15/0

9

9/29/0

9

10/13/0

90

10000

20000

30000

40000

Terabytes read daily

Page 19: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

19

HPSS weekly growth

04/03/1

1

05/19/1

1

07/04/1

1

08/19/1

1

10/04/1

1

11/19/1

1

01/04/1

2

02/19/1

2

04/05/1

2

05/21/1

2

07/06/1

2

08/21/1

2

10/06/1

2

11/21/1

2

01/06/1

3

02/21/1

3

04/08/1

3

05/24/1

3

07/09/1

3

08/24/1

3

10/09/1

35,000,001

10,000,001

15,000,001

20,000,001

25,000,001

30,000,001

PB (unique) PB (2-copy)

Page 20: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

20

Multi-resource tracking

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000HPC Core-hours (cumulative) GLADE GB HPSS GB

0.0500,000.0

1,000,000.01,500,000.02,000,000.02,500,000.03,000,000.03,500,000.04,000,000.0

HPC use disk GB tape GB

Page 21: Yellowstone Capability  and capacity  for  the atmospheric and related sciences

21

Questions?

Additional information• David L. Hart, Pam Gillman, and Erich Thanhardt. 2013. NCAR storage accounting

and analysis possibilities. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment (XSEDE '13). ACM, New York, NY, USA, Article 55, 7 pages. DOI=10.1145/2484762.2484802

• David Hart. 2011. Deep and wide metrics for HPC resource capability and project usage. In State of the Practice Reports (SC '11). ACM, New York, NY, USA, Article 1, 7 pages. DOI=10.1145/2063348.2063350

• David L Hart. 2011. Measuring TeraGrid: workload characterization for a high-performance computing federation. Int. J. High Perform. Comput. Appl. 25, 4 (November 2011), 451-465. DOI=10.1177/1094342010394382

• David L. Hart. 2012. Longitudinal user and usage patterns in the XSEDE user community. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment (XSEDE '12). ACM, New York, NY, USA, Article 53, 7 pages. DOI=10.1145/2335755.2335851