34
Lunch in (34-1) slides Experiences with NMI at Michigan Shawn McKee October 1, 2004 NMI/SURA Testbed Workshop

Experiences with NMI at Michigan

Embed Size (px)

DESCRIPTION

Experiences with NMI at Michigan. Shawn McKee October 1, 2004 NMI/SURA Testbed Workshop. Outline. A little history: NMI at Michigan About our environment and motivations Comments on some middleware components Issues for Middleware at Michigan Outlook and Summary. - PowerPoint PPT Presentation

Citation preview

Page 1: Experiences with NMI at Michigan

Lunch in (34-1) slides

Experiences with NMI at Michigan

Shawn McKeeOctober 1, 2004

NMI/SURA Testbed Workshop

Page 2: Experiences with NMI at Michigan

Lunch in (34-2) slides

Outline

A little history: NMI at Michigan About our environment and motivations Comments on some middleware

components Issues for Middleware at Michigan Outlook and Summary

Page 3: Experiences with NMI at Michigan

Lunch in (34-3) slides

History: Michigan as an “Honorary” SURA Member!

Michigan proposed to join the NMI/SURA testbed as soon as we heard about the opportunity

Michigan has a long history of work in this area: LDAP, NSFNet, AFS/IFS, KX509, CoSign, LDAP, NSFNet, AFS/IFS, KX509, CoSign, CHEF/OGCE/Sakai, NFS V4, …CHEF/OGCE/Sakai, NFS V4, …

We were beginning to start up a campus-wide Initiative call “MGRID” (Michigan Grid Research and Infrastructure Development)…NMI fit perfectly into our plans and interests

We were accepted into the testbed as its northern-most member…

Page 4: Experiences with NMI at Michigan

Lunch in (34-4) slides

Campus Research and Grid Motivation

MichiganMichigan is a major research institution with a large, varied mix of researchers.

Many of our departments make extensive use of computing/storage/network resources and are always requiring more, for the same (or less) cost…

Many of our researchers are part of larger national or international collaborations.

Grid computing and NMI middleware help us to optimize our existing resources and plug us in to developing national and international efforts.

This is likely the case for most Universities around the country…

Page 5: Experiences with NMI at Michigan

Lunch in (34-5) slides

Research Funding at Michigan…

Page 6: Experiences with NMI at Michigan

Lunch in (34-6) slides

Some More Context

Michigan, thru our MGRID initiative, has been adapting and adopting Middleware to enable our distributed resources

NMI has been a key component of our work Portals seem to be the key to enabling transparent

access to various resources We are building out for our future needs: tools like

KX.509KX.509 and our XML Grid AccountingXML Grid Accounting are being augmented with additional components like WaldenWalden and new applications like NTAPNTAP…

Page 7: Experiences with NMI at Michigan

Lunch in (34-7) slides

MGRID – www.mgrid.umich.edu

A center to develop, deploy, and sustain an institutional grid at Michigan

Many groups across the University participate in compute/data/network-intensive research grants – increasingly Grid is the solution• ATLAS, NPACI, NEESGrid, Visible Human, NFSv4, NMI

MGRID allows work on common infrastructure instead of custom solutions

Middleware, like NMI, make it possible

Page 8: Experiences with NMI at Michigan

Lunch in (34-8) slides

NMI Components

The NMI package consists of many components Michigan used many of the components in our

work on MGRID and with various application domains

KX.509 was central to much of our work bridging our Kerberos users to X509 (PKI) space

Grids components (Globus, Condor, etc) were the primary means to make resources accessible

Many NMI components were included in VDT, Grid3 and NPACI Rocks distributions

Page 9: Experiences with NMI at Michigan

Lunch in (34-9) slides

One Application Domain Perspective

I would also like to comment a bit on my biased application perspective

As a high-energy physicist I need to worry about accessing and processing LOTS of data, globally

In less than 3 years the LHC collider will begin to run and our ATLAS experiment will need to make ~10 Petabytes/year of data available to ~2000 physicists worldwide.

To handle this we need all the resources we can get…middleware is the basis for making these resources accessible and usable.

Page 10: Experiences with NMI at Michigan

Lunch in (34-10) slides

ATLAS (www.usatlas.bnl.gov)

A Torroidal LHC Apparatus

Collaboration• 150 institutes• 1850 physicists

Detector• Inner tracker• Calorimeter• Magnet• Muon

United States ATLAS• 29 universities, 3 national labs• 20% of ATLAS

Page 11: Experiences with NMI at Michigan

Lunch in (34-11) slides

Tier 1

Tier2 Center

Online SystemOffline Farm,

CERN Computer Ctr ~25 TIPS

BNL CenterFrance ItalyUK

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100-400 MBytes/sec

100 - 10000

Mbits/sec

Physicists work on analysis “channels”

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PByte/sec

~10-40 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2

Data Grids for High Energy Physics

Page 12: Experiences with NMI at Michigan

Lunch in (34-12) slides

Abort, retry, fail?

The Problem

Page 13: Experiences with NMI at Michigan

Lunch in (34-13) slides

The Solution

Page 14: Experiences with NMI at Michigan

Lunch in (34-14) slides

Building Upon NMI

Middleware is glue to enable applications easy access to resources, data and instruments

Portals organize the middleware while hiding complexity

Page 15: Experiences with NMI at Michigan

Lunch in (34-15) slides

Grid Portal Work

mod ssl

mod kx509mod kct

mod jk

CHEF

Apache

Tomcat

KCT

GateKeeperResource Manager

Service

Grid Service

KCA

Browser

kx509libpkcs11

kinit

User Workstation

KDC

Kerberos V5

SSL – Client Certificate required

GSI

Kerberos

Kerberos

WaldenWaldenLDAP

SASL

MGRID Portal

We would like to propose these for NMI R5+!

Page 16: Experiences with NMI at Michigan

Lunch in (34-16) slides

MGRID Accounting

Step 1: Grid scheduling software (e.g. PBSPro, Condor) generates usage log files in various formats

Step 2: MGRID Accounting translates usage log files into common XML format (http://www.psc.edu/~lfm/Grid/UR-WG/)

Step 3: MGRID Accounting ingests data into MySql database for report generation and review

Page 17: Experiences with NMI at Michigan

Lunch in (34-17) slides

Accounting Example on MGRID

Through our portal we can easily select and display account information for MGRID resources

Page 18: Experiences with NMI at Michigan

Lunch in (34-18) slides

MGRID Walden Authorization

Fine-Grained authorization module based on XACML standard (XACML-based policy engine)

Cluster owners have complete administrative control over who users their resources

Policy files define rules based on group membership, time of day, resource load, etc.

Local account management is unnecessary Group membership can be assigned from one or

several secure LDAP servers

Page 19: Experiences with NMI at Michigan

Lunch in (34-19) slides

Flowchart for Walden

Page 20: Experiences with NMI at Michigan

Lunch in (34-20) slides

NTAP: Network Testing and Performance Purpose: provide a secure and extensible network

testing and performance tool invocation service at U-M Service based on GlobusGlobus Has modular, fine-grained authorization

• Added signed group membership(s) to reservation data • Now provides two authorization methods:

- Keynote policy engine / AFS PTS group service- PERMIS policy engine / LDAP group service

Runs on dedicated nodes attached to routers in a VLAN environment

MGRID NTAP Projecthttp://www.citi.umich.edu/projects/ntap/

Page 21: Experiences with NMI at Michigan

Lunch in (34-21) slides

NTAP Architecture

mod ssl

mod kx509mod kct

Apache

kx509kinit

User Workstation Portal Host

mod jp

libpkcs11browser

LDAP

pilot

NW Topology Output

KCTKCAKDC

Kerberos V5

1. The user authenticates to the portal host via kx.509 and submits a network test request

2. The portal host constructs a path between specified

endpoints, issues test reservations, and

updates the output database.

GateKeeper

iperf, etcResource Mgr

PMP Host

GateKeeper

iperf, etcResource Mgr

PMP Host3. PMPs* on the test path runperformance tests between pairs of routers.

* Performance Monitoring Platform4. The portal host displays results.

mod php

Page 22: Experiences with NMI at Michigan

Lunch in (34-22) slides

GridNFS (NMI Development)http://www.citi.umich.edu/projects/ Michigan has been funded to develop GridNFS, a middleware

solution that extends distributed file system technology and flexible identity management techniques to meet the needs of grid-based virtual organizations.

The foundation for data sharing in GridNFS is NFS version 4 The challenges of authentication and authorization in GridNFS are

met with X.509 credentials In tying these middleware technologies together in the way we

propose, we fill the gap for two vital, missing capabilities.• Transparent and secure data management integrated with existing grid

authentication and authorization tools.• Scalable and agile name space management for establishing and controlling

identity in virtual organizations and for specifying their data resources. GridNFS is a new approach that extends “best of breed” Internet

technologies with established Grid architectures and protocols to meet these immediate needs

Page 23: Experiences with NMI at Michigan

Lunch in (34-23) slides

Some Comments about Select NMI Components

In the next few slides I want to discuss our experiences with a few specific components

Overall the NMI components have been indispensable for our activities at Michigan

There are numerous EDIT components regarding information management and organization that I won’t cover in detail, though these are required to make progress on inter-institutional collaboration and resource sharing

Page 24: Experiences with NMI at Michigan

Lunch in (34-24) slides

Globus Experiences

We had already been using Globus since V1.1.3 for our work on the US ATLAS testbed

The NMI release was nice because of the GPT packaging which made installation trivial.

There were some issues with configuration and coexistence:• Had to create a separate NMI gatekeeper to not impact

our production grid users• No major issues found…Globus just worked

Our primary Globus installation was via the Grid3 package for ATLAS

Page 25: Experiences with NMI at Michigan

Lunch in (34-25) slides

Condor-G

Condor was already in use at our site and in our testbed.

Condor-G installed over existing Condor installations produced some problems:• Part of the difficulty was not understanding the details of

the difference between Condor and Condor-G• A file ($LOG/.schedd_address) was owned by root rather

than the condor user and this “broke” Condor-G. Resolved via the testbed support list

Condor-G has evolved over the life of the testbed and is an integral part of our ATLAS Data Challenge infrastructure

Page 26: Experiences with NMI at Michigan

Lunch in (34-26) slides

Network Weather Service (NWS)

Installation was trivial via GPT (server/client bundles) Interesting product for us. We have done significant

work with monitoring. NWS advantages:

• Easy to automate network testing, once you understand the config details

• Prediction of future value of resources is fairly unique and potentially useful for grid scheduling

NWS disadvantages:• Difficult user interface (relatively obscure syntax to access

measured/predicted data)

Our REU student may take up an NWS related project

Page 27: Experiences with NMI at Michigan

Lunch in (34-27) slides

KX509 for Enabling Access

The University of Michigan has around 200,000 active “uniqnames” in its Kerberos authentication system. It is not feasible to replicate this into other systems and so we have developed KX509 for translation to PKI space.

Our MGRID portal and gatekeepers are all configured to utilize KX509 generated credentials from our users normal Kerberos identities.

This makes authentication trivial for our installed user base.

Page 28: Experiences with NMI at Michigan

Lunch in (34-28) slides

GSI OpenSSH

Useful program to extend functionality of PKI to OpenSSH.

Allows “automatic” interactive login to proxy holders based upon Globus mapfile entries

Simple to install---In principle a superset of OpenSSH on the server end

We had a problem with a conflict in dynamic libraries which it installs on a non-NMI host

Very convenient in conjunction with KX509

Page 29: Experiences with NMI at Michigan

Lunch in (34-29) slides

Campus Grid Implementation

Our MGRID challenge has been how to develop and enable a useable, deployable grid infrastructure across different academic/administrative divisions within the University

A key aspect of the challenge is the NMI components which are intended to “standardize” much of the needed functionality around information flow, authentication, authorization, monitoring and resource delivery

Delivering something which is as easy to use and deploy as possible is a very important…

Page 30: Experiences with NMI at Michigan

Lunch in (34-30) slides

Distribution and Installation

As we started to integrate NMINMI components and extend and develop our own concepts we ran into a major issue: others want to use/take advantage of what we are delivering.

Many of you likely realize the complexity which can surround the installation/configuration of even a single grid component, let alone a complete system involving many components.

Our plans are to provide PacMan PacMan distributions of our software as well as CDs for “bare metal” installs. This is a critically important (and just beginning) effort for us, especially as more users on campus start asking “How can I participate/take advantage of MGRID?”

Page 31: Experiences with NMI at Michigan

Lunch in (34-31) slides

Ease of Use and Adoption

One thing we realized early was a requirement that any grid solution we developed be easy to adopt and use.

MGRID choices have been strongly influenced by this overriding concept:• Using a portal to provide client capability• Leveraging existing authentication and information services as much

as possible• Providing tools and an environment for our “virtual grid computer”

similar to what a single workstation provides for its users Thus “Ease of Use and Adoption” is not just for Users but for

Administrators and Managers as well!

Page 32: Experiences with NMI at Michigan

Lunch in (34-32) slides

Authorization

Some of the hardest issues MGRID is facing are related to authorization.

We are tracking packages like Permis and Shibboleth to help provide solutions

Secure LDAP (Walden) can help provide a campus-wide resource building upon existing attributes to help “feed” authorization policy engines which are being developed

This is an area of intense interest for us, especially because of our work at Michigan on NFS V4, GridNFS and NTAP

Page 33: Experiences with NMI at Michigan

Lunch in (34-33) slides

Ongoing and Future Efforts

GridNFS has been funded for 3 years by the NSF NMI Development program

Development of MARS, a “Meta-scheduling” package, is now funded by NSF.

Planning how to merge NTAP into GNMI/Internet2 Easy to use installation and upgrade packages are under

development and are critical to our success on campus. Continue to emphasize standards and ease of use and

adoption as our guidelines for delivering functionality Continue efforts in Authorization and Accounting to

produce grid systems which deliver a range of capabilities similar to what individual systems current provide.

Page 34: Experiences with NMI at Michigan

Lunch in (34-34) slides

Points to Conclude With…

NMI has been a key component of our efforts Use of a portal can make access to various distributed

resources safe and easy Making it easy to distribute, deploy and configure middleware

has to be a priority if we are to make a real impact. Working with others is very important: Working with others is very important:

• Learning from their experiences• Input on our directions• Collaborating for common solutions

Michigan plans to continue working with NMINMI and developing needed infrastructure for successful, effective grids and networks.

LUNCH!