13
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund [email protected]

Grid Computing at The Hartford Condor Week 2008 Robert Nordlund [email protected]

Embed Size (px)

Citation preview

Page 1: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Grid Computing at The Hartford

Condor Week 2008

Robert [email protected]

Page 2: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

About The Hartford…

• Headquartered in Hartford, CT• Founded in 1810• Fortune 100• 31,000 Employees Worldwide• $26.5 Billion Revenues• $2.9 Billion Core Earnings• $377.6 Billion Assets Under Management

2

Page 3: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

The Hartford’s Businesses

• Property & Casualty• Auto, home, marine, workers compensation, etc.

• Retail Investment Products• Variable and fixed annuities, mutual funds, 529 college

savings plans

• Retirement Plans• 401(k), 403(b), 457

• Institutional Financial Solutions• Individual Life Insurance• Group Benefits• International

3

Page 4: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

A Brief History (2003)…

• Exponential growth in risk modeling activity exceeded our existing computing capabilities.

• Grid technology was identified as a possible solution.

• Condor was selected over other commercial solutions.• Mature• Windows Support• Simple, Scalable, and Flexible• Active Community• Free

4

Page 5: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Our Grid Environment…

• In Production Since 2004• Two Pools (Production, Test)• Dedicated and Non-dedicated Execute Nodes

• ~1000 Two-socket, multi-core x86 servers• ~1000 desktops, notebooks

• Linux Central Managers• Linux and Windows Job Schedulers• Windows Execute Nodes• Web-based Administration and User Console

5

Page 6: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Our Workload…

• Hedging• Risk Management• Portfolio Pricing• Product Development• Off-the-shelf Software• In-house Software• Embarrassingly Parallel

6

Page 7: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Typical Utilization

7

Page 8: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Technical Challenges

• Scaling – Rapid expansion of grid computing puts tremendous strain on operations (power, cooling, networking, floor space, etc.).

• DR/BCP – A “cold spare” is not an option when the system is over 1000 servers.

• Testing – An isolated, equivalent test environment is not an option (see above). Predictive modeling is necessary to simulate the environment at scale.

• Storage – Traditional storage options are limited in both capacity and throughput.

• Application Development – Developers need to be educated on writing “grid-friendly”, high-performance applications.

8

Page 9: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Non-Technical Challenges

• Policies – Effective and fair resource management policies need to be developed in cooperation with the users. Transparency is key in maintaining good relationships between user groups and between the users and IT.

• Expectation Management – Users need to know what to expect in a shared grid environment. • Variable Capacity

• Allocations vs. Named Servers

• Procurement – Vendors and internal purchasing departments aren’t typically accustomed to ordering 100’s of servers at a time.

• Finance – Traditional charge-back mechanisms ($/Server) don’t translate well to a grid environment.

9

Page 10: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Growth Opportunities

• Non HTC (High Throughput Computing) Workloads – Use grid resources to dynamically provision capacity for web services or other transactional business applications.

• Virtualization – Leverage grid resource management capabilities to orchestrate virtualized resources.

• More Scavenging – Continue to exploit underutilized resources throughout the enterprise to increase compute capacity.

• Incorporate external resources, e.g. cloud computing, utility computing, etc., to handle planned/unplanned peaks.

10

Page 11: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

What’s new with Condor…

• De-coupled Job Submission• Users submit jobs to database• Middleware feeds jobs to schedulers

• Dynamic Preemption Policies• Need to prevent long running jobs from being

preempted• Jobs should update class ads to indicate

progress

11

Page 12: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

What’s new with our infrastructure…

• Multiple Data Centers• One or two pools?• If two pools, how do we optimize utilization?• Clustered accountant?

• More cores per socket• Increased server counts

12

Page 13: Grid Computing at The Hartford Condor Week 2008 Robert Nordlund robert.nordlund@hartfordlife.com

Conclusion

• Grid has been a transformational technology giving users access to capabilities they wouldn’t have envisioned, or can now live without.

• Grid computing is an integral part of our business and gives the company a stable, scalable platform to model uncertainty.

• Condor has proven to be an invaluable asset and has time and again handled whatever challenge we’ve thrown at it.

• Grid isn’t dead – it’s just middle-aged.

13