10
Configuration Life- Cycle Management on the TeraGrid Ti Leggett

Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Embed Size (px)

Citation preview

Page 1: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Configuration Life-Cycle Management on the TeraGrid

Ti Leggett

Page 2: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Challenges of Managing Computational Resources

• Software, hardware, and user needs change rapidly

• Maintaining uniform resources• Handling one-offs• Staying current with patches and

security updates• Documenting how and what machines

run

Page 3: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Managing Configurations

• Unattended OS deployment– Jumpstart, Kickstart, Yast

• Cluster distributions– OSCAR, ROCKS

• Configuration management systems– Cfengine, LCFG, Bcfg2

Page 4: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

UC/ANL Cluster Configuration Management

• A microcosm of machine classes

• Cluster goals are to maximize availability, predictability and reliability

• Originally used SystemImager to duplicate similar classes

• Switched to Bcfg2 early 2005

Page 5: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Cluster Uniformity

• Necessary for the user

• Necessary for the administrator

• UC/ANL has two compute classes and many management classes running two different OS versions

Page 6: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Security

• Performing security patches

• Auditing cluster status

• Updating machines after extended downtime or maintenance

• Aiding intrusion detection

Page 7: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Reusability

• Machine failures– Disk failures– Non-disk failures

• Machine replication

• New machines

Page 8: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Specification as Documentation

• Dealing with administrator absences

• Using version control

• Teaching new administrators

• Dealing with already running and working machines

Page 9: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Future Work

• Reduce dependency on tape backups

• Integrate with tools such as Nagios, Nessus, and iptables

• Integration with LDAP

Page 10: Configuration Life-Cycle Management on the TeraGrid Ti Leggett

Questions?