22
23 May 2001 LSCCW A.Manabe 1 System installation & updates A.Manabe (KEK)

System installation & updates

  • Upload
    dinos

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

System installation & updates. A.Manabe (KEK). Installation & update. System(SW) installation & update is boring and hard work for me. Question: How do you install or update system for Cluster of more than 100 nodes. - PowerPoint PPT Presentation

Citation preview

Page 1: System installation  &   updates

23 May 2001 LSCCW A.Manabe 1

System installation &

updates

A.Manabe (KEK)

Page 2: System installation  &   updates

23 May 2001 LSCCW A.Manabe 2

Installation & updateSystem(SW) installation & update is

boring and hard work for me.Question:

How do you install or update system for Cluster of more than 100 nodes.

Question:Did you postpone a system upgrading, because the work is too much?

Page 3: System installation  &   updates

23 May 2001 LSCCW A.Manabe 3

Installation & Update methods1. Pre-installed, Pre-configured System

• you can postpone your work, but soon or later ...2. Manual installation; one PC by one PC.

• many operators in parallel with many duplicated installation CDs.

it require many CRTs, days and cost (to hire operators)3. Network Installation

• with NFS/FTP server and Automated ‘batch’ installation. ‘Server too busy’ in installation to many nodes. A lot of works still remain (utility SW installation...).

Page 4: System installation  &   updates

23 May 2001 LSCCW A.Manabe 4

Installation & update methods4. Duplicate disk image

• Attach many disks to one PC and dup. the installed disk, then distribute duplicated disks to nodes.

Hardware work is hard (attach/detach easy disk unit).

5. Diskless PC• Using local disks only for swap and /var directory,

other dir. from NFS server. Powerful server is necessary. Node can do nothing alone (trouble shooting may

become difficult).

Page 5: System installation  &   updates

23 May 2001 LSCCW A.Manabe 5

An Idea Make one installed host, clone the disk

image to nodes via network.

100PC installation in 10min. (objective value)

Necessary operator intervention as small as possible.

Page 6: System installation  &   updates

23 May 2001 LSCCW A.Manabe 6

Our planning method (1) Network Disk Cloning Software

• dolly+ For cloning disk image.

Network Booting• PXE (Preboot Execution Environment) with Intel NICFor starting an Installer.

Batch Installer• Modified RedHat kickstartFor disk format, network setup and starting cloning sw.

make private /etc/fstab, /etc/sysconfig/network..

Page 7: System installation  &   updates

23 May 2001 LSCCW A.Manabe 7

Our method (2) Remote Power Controller

• Network control power tap (Hardware) For remote system reset.

(replace ‘pushing reset button’ one by one)

Console server with a serial console feature of Linux.For watching everything done well.

Page 8: System installation  &   updates

23 May 2001 LSCCW A.Manabe 8

Dolly+100PC installation in 10 min.

A software to copy/clone files or/anddisk images among many PCs through a network.

Running on Linux as a user program.Free Software

Dolly is developed by CoPs project in ETH. (Swiss)

Page 9: System installation  &   updates

23 May 2001 LSCCW A.Manabe 9

Dolly+

Sequential file & Block file transfer.RING network connection topology.Pipeline mechanism.Fail recovery mechanism.

Page 10: System installation  &   updates

23 May 2001 LSCCW A.Manabe 10

Config fileNeed only for Server host.

Server = host having original images or files

iofiles 3/data/image_hda1 > /dev/hda1/data/image_hda5 > /dev/hda5/dev/hda6 > /dev/hda6server dcpcf001clients 10n001n002 (listing of all nodes)endconfig

Page 11: System installation  &   updates

Ring Topology• Utilize max.

performance ability of full duplex ports switches.

• Good for networks of complex of switches. (because connection is only needed between adjacent nodes)

S

Server = host having original image

Page 12: System installation  &   updates

• Server bottle neck both in network and server itself.

Broadcast or Multicast• UDP• Difficulty in making

reliable transfer on multicast.

Sever bottle neck in One Server-many clients topology

Server = host having original image

S

Page 13: System installation  &   updates

PIPELINING & multi threading

Next node

BOF

network

network

Server

Node 1

Node 2

EOF 1 2 3 4 5 6 7 8 9 …..

9

8 7

8 7

6

6

7

5

6

5

File chunk =4MB

3 thread in parallel

Page 14: System installation  &   updates

23 May 2001 LSCCW A.Manabe 14

Performance (measured) 1Server - 1Nodes (Pent.III 500Mhz)

• IDE disk/100BaseT network ~ 4MB/s• SCSI U2W/100BaseT network ~ 9MB/s• 4GB image copy >> 17min.(IDE), 8min.(SCSI)

1Server - 7Nodes• IDE/100BaseT• 4GB image copy -> 17min.(IDE) (+8sec.)

+Time for booting process.

Page 15: System installation  &   updates

23 May 2001 LSCCW A.Manabe 15

Expected performance1Server-100Nodes

• IDE/100 ~ 19min.(+2min.Ovh)• SCSI/100 ~ 9min.(+1min.Ovh)

Page 16: System installation  &   updates

0 200 400 600 800 10000

10

20

30

Number of hosts

Elap

sed

time

(min

)

4GB disk image

8GB disk image

Time for cloning

4MB chunk size, 8MB/s transfer speed

How many min. to install to 1000 nodes?

+100%

+50%

Page 17: System installation  &   updates

23 May 2001 LSCCW A.Manabe 17

S

Fail recovery mechanism• In my experience, ~2%

initial HW problem.

• Dolly+ provides automatic ‘short cut’ mechanism in node problem.• RING topology makes its

implementation easy. Short cutting

time out

Page 18: System installation  &   updates

Server bottle neck could be overcome.

Week against a node failure. Failure will spread in cascade way as well and difficult to recover.

Cascade Topology

Page 19: System installation  &   updates

23 May 2001 LSCCW A.Manabe 19

• Beta version will be available from

corvus.kek.jp/~manabe/pcf/dolly

after this work shop.

Page 20: System installation  &   updates

23 May 2001 LSCCW A.Manabe 20

Page 21: System installation  &   updates

5 10 50 100 500100010

100

1000

10000

number of hosts

trans

fer s

peed

(MB

/s)

aggregate transfer speed

4GB disk image

8GB disk image

4MB chunk, 8MB/s each transfer speed

Page 22: System installation  &   updates

PIPELINING & multi threading

Next node

BOF

network

network

Server

Node 1

Node 2

EOF 1 2 3 4 5 6 7 8 9 …..

9

8 7

8 7

6

6

7

5

6

5

File chunk =4MB