14
June 12, 2022 XSEDE Operations Patricia Kovatch, Victor Hazlewood, Justin Whitt Randy Butler, Chris Jordan, Stephen McNally, Steve Quinn, Troy Baer, Linda Winkler

XSEDE Operations

  • Upload
    tabib

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

XSEDE Operations. Patricia Kovatch, Victor Hazlewood , Justin Whitt. Randy Butler, Chris Jordan, Stephen McNally, Steve Quinn, Troy Baer, Linda Winkler. XSEDE Operations. Improve user productivity through enhanced Ease of use Reliability Quality assurance - PowerPoint PPT Presentation

Citation preview

Page 1: XSEDE Operations

April 21, 2023

XSEDE OperationsPatricia Kovatch, Victor Hazlewood, Justin Whitt

Randy Butler, Chris Jordan, Stephen McNally, Steve Quinn, Troy Baer, Linda Winkler

Page 2: XSEDE Operations

XSEDE Operations

• Improve user productivity through enhanced– Ease of use– Reliability– Quality assurance

• Track metrics to gauge our success and continually improve

2

Page 3: XSEDE Operations

Operations (1.5 FTE)Patricia Kovatch (.5)Victor Hazlewood (.5)

Justin Whitt (.5)NICS

Networking – 3.25 FTEsLinda Winkler, UChicago (.25)

Paul Wefel, NCSA (.25)Matt Ezell, NICS (1)

Kathy Benninger, PSC (.5)Chris Rapier (.25)

Joe Lappa, PSC (.5)William Jones, TACC (.5)

Security – 4.25 FTEsRandy Butler, NCSA (.25)

Jim Marsteller, PSC (.5)Adam Fest, PSC (.5)

Nathaniel Mendoza, TACC (.75)Victor Hazlewood, NICS (.5)

Ryan Braby, NICS (.5)James Barlow, NCSA (1)Jim Basney, NCSA (.25)

Data Services – 2.25 FTEsChris Jordan, TACC (.25)Jack Kordas, Uchicago (.5)Chad Kerner, NCSA (.25)

Rick Mohr, NICS (.5)Josephine Palencia, PSC (.5)Tomislav Urban, TACC (.25)

Systems Operational Support – 12 FTEsStephen McNally, NICS (.5)

Mike Lowe, IU (1)Justin Miller, IU (1)

Nada Cagle, NCSA (1)Mark Fredericksen, NCSA (1)

Mike Pingleton, NCSA (1)Frank Wells, NCSA (1)Rolf Wilson, NCSA (1)Tom Johnson, IU (.5)

Dave Lifka, Cornell (.25)Tim Bouvet, NCSA (.25)

Wayne Louis Hoyenga, NCSA, (.25)Rick Mohr, NICS (.5)

Dave Carver, TACC (.75)Leo Carson, SDSC (.5)

Shava Smallen, SDSC (.5)Tom Howe, Iaas/SaaS, UChicago (.5)

Byron Gill, PSC (.1)Anjana Kar, PSC (.2)

Kevin Sullivan ,PSC (.1)Jared Yanovich, PSC (.1)

Accounting and Account Management – 1.5 FTEs

Steve Quinn, NCSA (.5)Ester Soriano, NCSA (.75)

Ed Hanna, PSC (.25)

Software Support – 3.25 FTEsTroy Baer, NICS (.5)

Stuart Martin, GRAM, Uchicago (.25)Raj Kettimuthu, GridFTP, Uchicago (.25)

Tom Howe, Registry, Uchicago (.5)PSC (1.25)TACC (.5)

Page 4: XSEDE Operations

Deliverables and Goals

1. SecurityDeploy XSEDE Certificate Authority, deploy two factor authentication service, federate two factor authentication with BW, perform campus bridging with InCommon, provide security auditing services for XSEDE connected hosts, coordinate resource intrusion events;

2. Data ServicesDeploy XSEDE-wide parallel file system, coordinate data movement and management services, and develop a framework for distributed archival replication;

• Networking Facilitate end-to-end performance for users, transition to XSEDEnet, peer with R&E network;

Page 5: XSEDE Operations

Deliverables and Goals

4. Software SupportDeploy and perform acceptance testing of new capabilities and services into the production XSEDE environment, provide feedback to developers;

5. Accounting and Account ManagementMaintain current TG automatic distributed accounting and account management service, streamline account creation process, improve user access to stats;

6. Systems Operational SupportProvide frontline user support, systems administration for all centralized XSEDE services and monitoring through the 24x7 XSEDE Operations Center

Page 6: XSEDE Operations

XSEDE ServicesService Primary

LocationReplication Location

Account and allocation management, usage reporting database servers, and the XD Central Database (XDCDB)

SDSC PSC

User allocation online request web and database servers (POPS)

NCSA PSC

XD user portal, collaboration and social networking servers TACC NICS

User ticket system database and servers TACC NICS

24x7 computing and networking operations servers and displays for monitoring

NCSA IU

Website, documentation and document repository servers TACC NICS

User news mailing list and email servers TACC NICS

Online tutorials with CI-tutor and Virtual Workshop servers NCSA PSC

xsede.org DNS NCSA TACC

6

Page 7: XSEDE Operations

XSEDE ServicesService Primary

LocationReplication Location

Grid identity management including Certificate Authority, Public Key Infrastructure, MyProxy servers

NCSA PSC

Two factor authentication servers NICS PSC

Two factor authentication token NICS PSC

Inter-SP area parallel file system servers and disk NICS Each SP as appropriate

Initial archive replication service NICS TACC

XD cross site security logging aggregation service PSC NICS

Grid Interface Unit and Resource Namespace Servers (RNS) Every SP N/A

Grid services monitoring servers PSC TACC

Knowledge Base IU N/A

VM hosting IU N/A

7

Page 8: XSEDE Operations

Operational Metrics

Cybersecurity– Security events, logins and login types, security

items deployed, security awareness training events

Data management and coordination– wide area parallel file system usage and uptime

Networking– Network uptime and usage

Software maintenance and coordination– Software deployment issues and resolution

8

Patricia Kovatch
Are all other talks including metrics?
Page 9: XSEDE Operations

Operational Metrics – cont’d

Accounting and account management– Account creation time for PI and non-PI

(Goal: Decrease account creation time to within five business days)

System operational support– Deliver 95% uptime on critical centralized

services– Respond meaningfully to all tickets within 24

hours– Close 80% of all tickets within two business days

9

Patricia Kovatch
What number should this be?
Page 10: XSEDE Operations

Review of activities to July 1

1.1.3.1 Deploy grid middleware infrastructure1.1.3.3 Deploy account management software1.1.3.4 Deploy information services infrastructure1.1.3.5 Deploy common user environment1.1.3.6 Deploy system of systems test environment1.1.4.2 Deploy XSEDE website servers1.2.1.1 Coordinate XSEDE security incident response1.2.4.1 Test XSEDE software1.2.6.1 Setup XSEDE Operations Center1.2.3.1 Transition to XSEDEnet1.3.2.1 Setup and populate XSEDE.ORG DNS

Page 11: XSEDE Operations

Review of activities to July 1 (continued)

1.2.6.5 Migrate AMIE to stand alone server off of XDCDB at both primary and secondary1.2.6.5 Upgrade XDCDB hardware at SDSC1.3.2.1 Deploy XSEDE User Portal (XUP) servers

Page 12: XSEDE Operations

Preview of year 1 activities

1.1.3.2 Deploy data management software1.2.1.1 Deploy XSEDE Certificate Authority (CA)1.2.1.2 Develop security awareness program1.2.1.3 Deploy security authentication program1.2.1.4 Deploy security tools1.2.1.5 Deploy security infrastructure1.2.1.6 Deploy InCommon authentication service1.2.2.1 Deploy global parallel file system1.2.2.2 Design archival replication framework

Page 13: XSEDE Operations

Ongoing

1.2.3.1 Maintain and monitor XSEDEnet1.2.3.2 Tune end-to-end performance1.2.4.1/2 Test and deploy XSEDE software1.2.5.1 Maintain accounting and account management databases1.2.5.2 Provide usage reports1.2.6.1 Provide frontline user support 24x7 XSEDE Operations Center (XOC)1.2.6.2 Deploy and support XSEDE system infrastructure1.2.6.3 Support deploy security tools/infrastructure1.2.6.4 Report operational metrics (yearly)

Page 14: XSEDE Operations

DNS transition plan

• Ops Networking leading the DNS transition• xsede.org primary service moving to NCSA,

backup at TACC• Delegation of {site}.xsede.org to sites• XSEDE staff should review DNS needs– Determine teragrid.org entries to duplicate– Determine new xsede.org entries– Review and coordinate with XSEDE L3 manager

• XSEDE L3 Manager or delegate submits dns requests in TG help ticket