19
Server Base Manageability Guide for SBSA Compliant Arm (AARCH64) Servers Supreeth Venkatesh, Staff System Architect Arm Open System Firmware

Server Base Manageability Guide for SBSA Compliant Arm

  • Upload
    others

  • View
    10

  • Download
    2

Embed Size (px)

Citation preview

Server Base Manageability Guide for SBSA Compliant Arm (AARCH64) Servers

Supreeth Venkatesh, Staff System ArchitectArm

Open System Firmware

Agenda

• Rationale• History• Introduction• Proof of Concept• Roadmap/Plan• Call to Action

Rationale• Arm Ecosystem Partners value standardized common server manageability

capabilities with scope for flexible customizations which add value to the end user.

• Standardization is key to ensure that Arm Ecosystem does not get fragmented by point solutions that plague the industry today.

• Leverage industry standard system management specifications including but not limited to Redfish, Platform Level Data Model (PLDM), Management Component Transport Protocol (MCTP) as defined by the Distributed Management Task Force (DMTF).

• Leverage Hardware Management Specifications and designs as defined by Open Compute Group (OCP).

History

• Server Base System Architecture (SBSA)-Hardware system architecture for servers based on 64-bit ARM processors.-Standardizes processor element features and key aspects of system architecture.-http://infocenter.arm.com/help/topic/com.arm.doc.den0029b/Server_Base_System_Architecture_v5_0_ARM_DEN_0029B.pdf

• Server Base Boot Requirements (SBBR)-Defines the Boot and Runtime Services expected by an enterprise platform Operating System or hypervisor, for an SBSA-compliant Arm AArch64 server.

-Based on UEFI, SMBIOS and ACPI specifications.-http://infocenter.arm.com/help/topic/com.arm.doc.den0044c/Server_Base_Boot_Requirements_v1_1_Arm_DEN_0044C.pdf

History

• ARM Enterprise ACS -Architecture Compliance Suite tests SBSA and SBBR specifications.https://github.com/ARM-software/arm-enterprise-acs

FVP

FWTS

Arm Partner OSS

SBBR SBSA

SBSA

SCTSBBR

PAL

TF-A

UEFI

LuvOS

Introduction

• Server Base Manageability is a specification that is under development in conjunction with partners across the industry.

• Together with SBSA and SBBR, the SBMG provides a standard based approach to building Arm servers, their firmware and their server management capabilities.

• SBMG is developed within the Arm ServerAC community.• Engineering change request process similar to other standards bodies.• Anybody in the community can raise a request for a change. Public to

community by sending a mail to [email protected] Or raising a public ticket on the mantis DB https://atg-mantis.arm.com/

Introduction – Compliance Levels• Defines several levels of manageability compliance (e.g., M0, M1, M2)

SoC-BMC Interfaces

BMC-Platform Elements Interfaces

BMC Management Services Interfaces Host Interface

BMC-IO Device Interface

LEVEL M0IMPLEMENTATION DEFINED

IMPLEMENTATION DEFINED

IMPLEMENTATION DEFINED

IMPLEMENTATION DEFINED

IMPLEMENTATION DEFINED

LEVEL M1IMPLEMENTATION DEFINED

IMPLEMENTATION DEFINED Redfish and IPMI

IPMI based Host Interface and Redfish Host Interface/MCTP Host Interface

IMPLEMENTATION DEFINED

LEVEL M2IMPLEMENTATION DEFINED Redfish/PLDM/MCTP Redfish and IPMI

IPMI based Host Interface and Redfish Host Interface/MCTP Host Interface NC-SI

Introduction – Sub teams

• To help guide the Arm server designers to provide common manageability functions to the end users.

• To accelerate development of Server Base Manageability Guide (SBMG), three different teams within Arm ServerAC community with participation from several different Arm Ecosystem Partners have been formed.

-Reliability, Availability, Serviceability (RAS) team.-Platform Monitoring and Control team.-Remote Debug team.

Introduction – RAS

• Define error record formats for RAS errors leveraging existing common platform error record (CPER) as specified in unified extensible firmware interface specification (UEFI).

• Define the SOC - BMC manageability interface requirements for RAS errors including in-band and out-of-band interfaces.

• Define fault notification signal.• Define the interface and mechanism for injecting RAS hardware errors.

Introduction – RAS

BMC Satellite Management

Controller

Application Processor (SOC)

EventMessageSupported(Version, TID)

EventMessageBufferSize(CperSize)

Generate/Store CPER

RAS Errror EventCPER/Async Event

return(EventClass)

RAS Event Supported? yes

return(MaxBufferSize)

MaxBufferSize < CperSize?Yes

PlatformEventMessage(RasPollEvent)

PollForPlatformEventMessage (GetFirstPart, 0x0000)

return(PLDM_BASE_CODE)

return(nextDataHandle, Start=0x01,

RasPollEvent, EventDataSize,

EventData )

PollForPlatformEventMessage (GetNextPart, 0xFFFF)

return(nextDataHandle,Middle=0x02, RasPollEvent,

EventDataSize, EventData )

PollForPlatformEventMessage (GetNextPart, 0xFFFF)return(nextDataHandle,

End=0x04, RasPollEvent,

EventDataSize, EventData,

EventDataIntegrityChecksum )

PollForPlatformEventMessage (AcknowledgementOnly, 0xFFFF)return(nextDataHandle,

StartAndEnd=0x05, EventID=0x0000 )

Introduction – Monitoring & Control

• Define interfaces and protocol needed for BMC – SOC communication in the scope of Platform Monitoring leveraging Open Hardware Management Specification for Remote Machine Management defined by OCP.

• List of use cases analyzed for Standardizing-BMC to Multiple SOC communication.-BMC assisted SOC power actions.-BMC to monitor critical health of SOC.-BMC to monitor SOC boot progress.-BMC watchdog use cases.

Introduction – Monitor &

Control

BMC Satellite Management

Controller

RunInitAgent(PLDMTerminusOnline)

GetPDRRepositoryInfo()

return(PDRInfo)

GetPDR(0x0000,..)

return(nextRecordHandle)

GetPDR(recordHandle,..)

return(0x0000)

Create Central PDR Repo

RunInitAgent(SystemHardReset)

GetSensorReading(SensorID, ...)

return(presentReading, ...)

Calculate Reading

Reading Conversion formula: Y = (m * X + B)

Where:

Y = converted reading in Units

X = reading from sensor

m = resolution from PDR in Units

B = offset from PDR in Units

Units = sensor/effecter Units, based on the Units and auxUnits fields from the PDR for the numeric sensor

Presenter
Presentation Notes
Runinitagent Scan the PDR Repository for Initialization PDRs (for example, numeric sensor/effecter initialization PDRs or state sensor/effecter initialization PDRs) that are associated for the terminus. For each PDR that is found, perform the following actions: Set the sensor type, sensor thresholds, and hysteresis as directed by the PDR using the SetSensorThresholds and SetSensorHysteresis commands. Use the appropriate enabling command (for example, SetNumericSensor Enables if the sensor is a numeric sensor) to enable scanning and event generation per the PDR. Calculation Example For example, a sensor with the following units, resolution, offset, and reading: Reading = 0xBF Units = Volts Resolution: 26.17801 mV Offset = -1.00 V would have the following the converted reading: Y = (26.17801 * 10-3 V * 0xBF + (-1.00 V)) = [(.02617801 * 191) - 1.00 ] V = 4.00 V For example, if the PDR indicates the following: Accuracy: ± 4% Tolerance: ± 1 count (given) combined with the previous example, the full interpretation of the reading would be: (4.00 V ± 26.17801 mV) ± 4% where ± 26.17801 mV corresponds to the effect of a Tolerance of ± 1 count.

Introduction – Remote Debug

• Server Remote Debug is the act of gaining visibility of the hardware and software behaviors of an SoC, using a debugger which is not directly connected to the Server SoC.

• Define protocols for communicating between the debugger and the BMC.• Define physical interfaces between BMC and SoC.• Define protocols for communicating between the BMC and the SoC.• Define mechanisms for ensuring only suitable debuggers can access the

SoC.

Presenter
Presentation Notes
Typically, the debugger is connected to the Server board via a network connection. A Board Management Controller (BMC) is the endpoint of the network connection.

Introduction – Remote

Debug

Proof of Concept

• OpenBMC is a Linux foundation project. It is a highly extensible framework for BMC software and implement for data-center computer systems.

• OpenBMC implementation will be used as a medium to realize Server Base Manageability Guide.

• Arm and Arm SiPs are participating in the OpenBMC development so that there will be an open source implementation to the SBMG requirements.

• Arm has joined OpenBMC Technical Steering Committee (Arm, IBM, Intel, Facebook, Google & Microsoft).https://github.com/openbmc/docs/blob/master/README.md#technical-steering-committee

Implementation

Proof of Concept –

RAS

RAS App(BMC) libPLDM (BMC) libPLDM (Satellite MC)

Init(Blocking/Non-Blocking)

RAS Driver (Satellite MC)

Driver Starts

return (pldm_response)

Success

libMCTP (Satellite MC)

libMCTP(BMC)

mctp_init

return (InitCode)

mctp_binding_int()

return (BindInitCode)

Daemon Starts

Init(Blocking/Non-Blocking) mctp_init

return (InitCode)

mctp_binding_int()

return (BindInitCode)return (InitCode)

encode_pldm_cmd(PlatformEventMessage,

RASPollEvent)return (pldm_msg)

send_msg(mctp_eids, pldm_msg)

mctp_message_tx(msg, len)

Binding Physical LayerPldmRequestNotification

PldmRequestNotification

GeneratePLDM

Response

send_msg(mctp_eids, pldm_msg)

mctp_message_tx(msg, len)

Binding Physical Layer PldmResponseNotification

decode_pldm_cmd(pldm_response,

)encode_pldm_cmd(PollforPlatformEventMessage,

GetFirstPart)return (pldm_msg)

send_msg(mctp_eids, pldm_msg)

mctp_message_tx(msg, len)

Binding Physical Layer PldmRequestNotification

PldmRequestNotification

GeneratePLDM

Response

Presenter
Presentation Notes
libPLDM – Deepak (IBM and Team) in OpenBMC. libMCTP – Jeremy (IBM and Team) as part of OpenBMC.

Proof of Concept – Remote Debug• Proposed Design proposal is to integrate OpenOCD within OpenBMC stack.

https://lists.ozlabs.org/pipermail/openbmc/2019-July/017122.html• OpenOCD is an open source on-chip debugging solution for JTAG connected

processors. It enables source level debugging with GNU gdb. It can also integrate with and GDB aware IDE, such as eclipse.

• Completed first phase of implementation which includes the demonstration of GDB connection on port 3333 to OpenOCD debug server daemon running on OpenBMC.

• Next phase of implementation to utilize a wrapper which adds JTAG master core infrastructure by defining new JTAG class and provide generic JTAG interface to allow hardware specific drivers to connect this interface.https://patchwork.ozlabs.org/cover/848652/

• This will enable all JTAG drivers to use the common interface part and will have separate drivers for hardware implementation.

Roadmap/PlanH

ost F

irm

war

e

Management Console Transport Protocol Enablement.

Reference Implementation to demo Control and Monitor on board and on SOC sensors of a chosen reference platform.Reference PLDM

enablement to send sensor data when requested by MC.

Reference Implementation to demo Remote Firmware upgrade Capability on a chosen reference platform.Leverage existing UEFI

capsule update and arm-tfupdate methods.

Reference Implementation to demo BMC RAS Errors on a chosen reference platform.Leverage existing CPER

records.

Reference Implementation to demo Remote Debug Capability on a chosen reference platform.JTAG Interface.

Security requirements/Root of Trust.

Man

agem

ent C

ontr

olle

r Pick/Choose Reference Hardware Platform for OpenBMC Enablement.

Enable OpenBMC. Demonstrate

Reference IPMI/REDFISH Commands.

Reference Implementation to demo Control and Monitor on board and on SOC sensors of a chosen reference platform.IPMI/REDFISH Command

implementation to gather and control on board and on SOC sensor.

Reference Implementation to demo Remote Firmware upgrade Capability on a chosen reference platform from BMC.Leverage PLDM for Firmware

Update standard.

Reference Implementation to demo transfer of Binary encoded Json (BEJ) RAS errors from SOC to BMC.IPMI/REDFISH

implementation to transfer RAS errors

Reference Implementation to demo Remote Debug Capability from BMC.Integrate OpenOCD in BMC

stack.

Sync up with Security team.

Stan

dard

s

PMCI WG Participation –MCTP, PLDM, Redfish

PMCI WG Participation –MCTP, PLDM, RedfishWhitepaper/Reference

Solution/ECR - Control and Monitor on board and on SOC sensors on arm servers.

Arm Specific Requirement Updates to MCTP, PLDM and Redfish.Whitepaper/Reference

Solution/ECR - Remote Firmware upgrade Capability

Arm Specific Requirement Updates to MCTP, PLDM and Redfish.Whitepaper/Reference

Solution/ECR – BMC RAS.

https://www.dmtf.org/standards/pmci (Target PMCI documents.)Whitepaper/Reference

Solution/ECR - Remote Debug from BMC.

PMCI Security Updates. Interface

Standardization.

Call to Action• Participate in OpenBMC to enable reference implementation and open

source delivery option.

• Participate in OCP to influence Hardware Management Specifications and designs.

• Participate in ServerAC to help define SBMG.

• Send an email to Arm ([email protected]).

• Participate in Redfish, PMCI and other DMTF Workgroups.