Upload
adele-black
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
17th Sep 2007 Andrey [email protected]
Computing cluster at NCG
Introduction Past upgrades Current state of the cluster Problems with cluster Where to find out information about the cluster Conclusion
17th Sep 2007 Andrey [email protected]
Introduction
The cluster has appeared at the end of 1999– Persons who started to tune the cluster :
Jerome Lauret and Andrey Shevel
– Initially there were 33 machines by 500 MHz and 256 MB of main memory, around 1 TB of disk space (all disks were connected to one RAID controller).
– Main machine was Digital Alpha server.– About 25 persons were registered first year of
operation (2000).
17th Sep 2007 Andrey [email protected]
Past upgrades
With the time the disk storage was increased by 5 times– Computing power has been increased by 3 times
at least.– Alpha server has been retired and main computer
now is Intel based server (ram11).– All file systems are on separate disk controller.– Many other improvements.
All above permitted us to work many years almost without support. I am proud to inform you about this fact.
17th Sep 2007 Andrey [email protected]
Current state of the cluster
Nominal Reality
Machines 34 31
Raid arrays 7 5
17th Sep 2007 Andrey [email protected]
Computing cluster problems
Liquid leaking from upper flour The batteries in both UPSs were expired. The UPS procedure for auto shutting down is out
of order No reservation for central machine (this machine
was affected several times by water in past years) Needs to be watched almost every day (power,
water, temperature, etc) No remote access to consoles of the machines No remote control of electrical power No policies (rules) how to use the resources on
the cluster.
17th Sep 2007 Andrey [email protected]
17th Sep 2007 Andrey [email protected]
Nearest upgrades
At first we need to move the cluster physically to another place in the same room. - DONE
We need to install all new machines (9 machines). – in progress– Prepare automatic procedure to install the
software – in progress– To upgrade the version of SL to follow
RACF (BNL). – in progress
17th Sep 2007 Andrey [email protected]
Where is info about the cluster
General info about the cluster http://ram3.chem.sunysb.edu/ramdata/news.shtml
User mailing archive https://ram3.chem.sunysb.edu/ramdata-news
System mailing archive https://ram3.chem.sunysb.edu/ramdata-system
17th Sep 2007 Andrey [email protected]
The cluster role
I think now role of the cluster is even more than at the beginning (more people are interested how to use cluster).
For those who needs relatively small fraction for computing power the cluster power is enough. For others who need huge computing power on largest remote clusters the local one is good gateway for remote large cluster.
17th Sep 2007 Andrey [email protected]
Conclusion
Several steps must be undertaken to improve the situation:– To find one or two volunteers which would watch
the cluster;– To find the funding agency where to submit new
request for financial support for cluster upgrade.– May be we need to discuss how to use the cluster
as the department computing facility.