25
Risk and Security Management for Distributed Supercomputing with Grids Urpo Kaila <[email protected]> Funet CERT & CSC 2006-09-22 19th TF-CSIRT Meeting,Espoo, Finland

Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Risk and Security Management for Distributed Supercomputing with Grids

Urpo Kaila <[email protected]>Funet CERT & CSC

2006-09-2219th TF-CSIRT Meeting,Espoo, Finland

Page 2: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Agenda� Grid’s and supercomputing

� Some definitions� How do they work?� Example of Grids

� Grids and Security� Risk management and Security domains� Creating baselines for Security

� Case M-grid revisited� Organisation and setup� Security Working Group� Risk analysis, Security Policy & Acceptable Use Policy� User Security Guide, Administrator Security Guide

� Grid Security and CSIRT’s� Making Grid Security compatible� Incident handling

Page 3: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Some definitionsSupercomputers

most efficient systems world-wide on a given time for massive parallel processing of advanced research tasks

Distributed computingseveral inter-connected computers share the computing tasks assigned to the system [IEEE]

ClusterSimilar efficient computers coupled closely together

Grid computingAffordable high performance distributed computing with interconnected clusters

Moore’s law as seen on the Top500 list

Pentium 4 = ~ 2-4 GFlops

Page 4: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

What is the Grid?

Grid according to Ian Foster (2002) in "What is the Grid? A Three Point Checklist“:• Computing resources are not administered centrally. • Open standards are used. • Non-trivial quality of service is achieved

Different types of grids� Info-grid -WWW� Data-grid - Databases� Compu-grid - Computing

Evolved from computational needsof "big science"

Grid must have:� Virtual organisations� Middleware� Truly Distributed

Page 5: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

How do they work?

$ grid-proxy-init

Your identity: /O=Grid/O=NorduGrid/OU=csc.fi/CN=Urpo Kaila

Enter GRID pass phrase for this identity:

$ ngsub -d 1 -f mygridjob.xrsl

Page 6: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

The Role of Grid Middleware

NorduGrid ARC Tutorial / Arto Teräs and Juha Lento 2005-09-20

Page 7: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Examples of Grids and Grid resources

• TeraGrid - Open scientific discovery infrastructure financed US National Science Foundation

• DEISA - Distributed Euroapean Infrastructure for SupercomputingApplications

• EGEE - The Enabling Grids for E-sciencE• LHCG - Large Hadron Collider Grid (CERN)• e-IRG - The e-Infrastructure Reflection Group • NorduGrid - a Grid Research and Development collaboration• The Globus Alliance - an international collaboration that conducts

research and development to create fundamental Grid technologies

Page 8: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Grids and Security

Page 9: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Threats

What excites hackers? (A. Cormack, 2002)

• High profile targets – to enhance their reputation

• Powerful CPU – for password cracking etc.• Large disk – to distribute illegal material• High bandwidth – for denial of service

attacks

WARNING! When working on the Grid, you must accept that some information on your jobs and on your Grid identity is made public. This includes your name, your affiliation, IP address of your client computer, job names and duration, used runtime environment names and other less sensitive information (see the Grid monitor for example). (Nordugrid)

Page 10: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Security matrix

Security policies and guides

Training and awarness building

Incident handlingSecurity Management

Patching vulnerabilities IPS

ForensicsFirewallsCryptography

TechnicalSecurity

ProacticeSecurity

ReactiveSecurity

Page 11: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Risk and (proactive) security

Risk management (à la Wikipedia)

1.1 Establish the context 1.2 Identification 1.3 Assessment 1.4 Potential Risk Treatments

1.4.1 Risk avoidance 1.4.2 Risk reduction 1.4.3 Risk retention 1.4.4 Risk transfer

1.5 Create the plan 1.6 Implementation 1.7 Review of the plan

Security Domains [à la (ISC)2 CISSP CBK]

1. Access Control 2. Application Security3. Business Continuity and Disaster Recovery

Planning 4. Cryptography 5. Information Security and Risk Management 6. Legal, Regulations, Compliance and

Investigations7. Operations Security 8. Physical (Environmental) Security 9. Security Architecture and Design 10. Telecommunications and Network Security

residual ri

sks

Page 12: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

What has already been done (examples)

Joint Security Policy Group LCG/EGEE:

� The LCG Security and Availability Policy � The Grid Acceptable Usage Policy� The Virtual Organisation Security Policy

PlanetLab

� Acceptable Use Policy (AUP)

E-Infrastructure Reflection Group (e-irg)

� Authentication and authorisation policies� Usage policies� Etc

Page 13: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Case M-grid revisited

Page 14: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

M-Grid - Material Sciences NationalGrid Infrastructure in Finland

� Joint project between CSC, seven universities and The Helsinki Institute of Physics (HIP)

� Connected to the Nordic NorduGrid network, but access is currently limited to M-grid partners and CSC customers

� The systems are particularly suitable for high-throughput running of sequential and easy-to-parallel programs

� The theoretical computing capacity of the system is approximately 2.5 Tflops.

� M-grid is based on HP ProLiant DL145, DL385 and DL585 servers equipped with 64 bit AMD Opteronprocessors (642 altogether)

Page 15: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

The M-Grid Security Working Group

� Organisation� Started January 2006, meetings once in a month, exept summertime� Members: CSC staff, visiting experts and M-Grid administrators:

� Juha Jäykkä (UTU)� Michael Gindonis, Kalle Happonen (HIP)� Ivan Degtyarenko (HUT)� Vera Hansper (JYU)

� Reports to M-Grid Administrators meeting� Collaborating with the HIP Wiki

� Task� Risk analysis� To create a set of security policies and guidelines� Technical planning, implementation and supervision� Incident handling

Page 16: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

The M-Grid Risk analysis 2006

Likelihood

Impact

Low

Problematic

Medium

High

Disaster

Internal - IntentionalInternal - AccidentalExternal - IntentionalExternal - Accidental

Over 50 threatsidentified and analysed!

Picture by Vera Hansper

Risk = likelihood x impact

“Mitigate”

Residual

Page 17: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

M-grid Security Policy (Reviewed)

1. Introduction ( scope, objectives)2. Participants, roles and

responsibilities 3. Physical security 4. User accounts and access

control Local accounts Grid accounts Virtual Organization management Certificate Authorities

5. Network security Network access and services Additional services Firewalls

4. Network security (contd.)Firewalls

5. Operational security Patches Monitoring

6. Confidentiality and privacyGrid users Local users and administrators

7. Incident response 8. Compliance

Exceptions 9. Approval and review 10. Comments

Page 18: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

M-grid Security Policy (examples)

� Accounts must be protected by a good password or other method providing equivalent security

� Sites are allowed to create time-limited accounts for persons working in documented collaboration projectsoutside the site's organization

� Sites may offer additional services which are open to a large user base, but these must be approved by the M-grid administration

� Sites must not offer any additional services running on the administration server without approval of the M-grid administration.

A node

Page 19: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

M-grid Acceptable Use Policy

� Short, intended for the user, the security policy is to be read when needed

� Examples of content:

� By using the M-grid resources you automatically agree to comply with this Acceptable Use Policy “

� You must act in a responsible manner and must not cause harm to other users, to M-grid or to other systems.

� You may not use M-grid for illegal activities. � The M-grid services and systems are intended for professional,

academic research or education. � Your account is personal and may not be shared with other

people

§

Page 20: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Security Guides

� M-grid User Security Guide

� A short technical howto� Example:

� Your proxy certificate is … not protected bya password therefore it should not be validfor longer than necessary as proxy certificatescan be easily renewed

� M-grid Administrator Security Guide� A Longer howto� Under construction

Page 21: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Examples of Technical security tasks

Implemented and on-the-wish list

� Firewall-rpm� Log management and monitoring� Integrity check� Package signing� Availability monitoring� Automatic alerting� Backup of frontend� ssh- key managemnt� Security audits

Page 22: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Grid Security and CSIRT’s

Page 23: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Making Grid Security compatible

� The grid’s tend to interconnect – we need compatible security

� Complex new technologies and “fuzzy” virtual organisations in “our hosts and networks”

� International cooperation needed� Technical level – Management level� Reactive security - Proactive security

� The risks haven’t materialized yet

Page 24: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Grid Incident handling

� Existing CSIRT’s should be used asprofessional incident handling hubs

� Constant and proactice knowledge transferneeded between Grid administration,CSIRT’s and site administators

� In the M-Grid Security policy already a paragraph:

� The administrator, in consultation with CSC should also inform Funet CERT ([email protected], tel. +358-9-4572038) if the incident affects other M-grid sites

Page 25: Risk and Security Management for Distributed Supercomputing … · 2006. 9. 29. · 2. Application Security 3. Business Continuity and Disaster Recovery Planning 4. Cryptography 5

Finally - Finnish security terminology :)

Information – TietoSecurity – turvallisuusIncident – poikkeamaMany incidents – poikkeamiaThe interrogative form – ~koAlso – ~kinHave there been – oliko

Have there also been any security incidents?Oliko tietoturvapoikkeamiakinko?