Fabric Interfaces Architecture Sean Hefty - Intel Corporation

  • View

  • Download

Embed Size (px)

Text of Fabric Interfaces Architecture Sean Hefty - Intel Corporation

Slide 1

Fabric Interfaces ArchitectureSean Hefty - Intel CorporationChangesv2Remove interface objectAdd open interface as base objectAdd SRQ objectAdd EQ group objectv3Modified SRQEnhanced architecture semantics www.openfabrics.org2OverviewObject ModelDo we have the right type of objects defines?Do we have the correct object relationships?Interface SynopsisHigh-level description of object operationsIs functionality missing?Are interfaces associated with the right object?Architectural SemanticsDo the semantics match well with the apps?What semantics are missing?www.openfabrics.org3Object Class ModelObjects represent collection of attributes and interfacesI.e. object-oriented programming modelConsider architectural model only at this pointwww.openfabrics.org4Objects do not necessarily map directly to hardware or software objectsConceptual Object Hierarchywww.openfabrics.org5Object inheritanceObject Relationshipswww.openfabrics.org6Object scopeFabricRepresents a communication domain or boundarySingle IB or RoCE subnet, IP (iWarp) network, Ethernet subnetMultiple local NICs / portsTopology data, network time stampsDetermines native addressingMapped addressing possibleGID/LID versus IPwww.openfabrics.org7Passive (Fabric) EPListening endpointConnection-oriented protocolsWildcard listen across multiple NICs / portsBind to address to restrict listenListen may migrate with addresswww.openfabrics.org8Fabric EQAssociated with passive endpoint(s)Reports connection requestsCould be used to report fabric eventswww.openfabrics.org9Resource DomainBoundary for resource sharingPhysical or logical NICCommand queueContainer for data transfer resourcesA provider may define multiple domains for a single NICDependent on resource sharingwww.openfabrics.org10Domain Address VectorsMaintains list of remote endpoint addressesMap native addressingIndex rank-based addressingResolves higher-level addresses into fabric addressesNative addressing abstracted from userHandles address and route changeswww.openfabrics.org11Domain EndpointsData transfer portalSend / receive queuesCommand queuesRing buffersBuffer dispatchingMultiple types definedConnection-oriented / connectionlessReliable / unreliableMessage / streamwww.openfabrics.org12Domain Event QueuesReports asynchronous eventsUnexpected errors reported out of bandEvents separated into EQ domainsCM, AV, completions1 EQ domain per EQFuture support for merged EQ domainswww.openfabrics.org13EQ GroupsCollection of EQsConceptually shares same wait objectGrouping for progress and wait operationswww.openfabrics.org14Domain CountersProvides a count of successful completions of asynchronous operationsConceptual HW counterCount is independent from an actual event reported to the user through an EQwww.openfabrics.org15Domain Memory RegionsMemory ranges accessible by fabric resourcesLocal and/or remote accessDefines permissions for remote accesswww.openfabrics.org16Interface SynopsisOperations associated with identified classesGeneral functionality, versus detailed methodsThe full set of methods are not defined hereDetailed behavior (e.g. blocking) is not definedIdentify missing and unneeded functionalityMapping of functionality to objectswww.openfabrics.org17Use timeboxing to limit scope of interfaces to refine by a target datewww.openfabrics.org18Base ClassCloseDestroy / free objectBindCreate an association between two object instancesSyncFencing operation that completes only after previously issued asynchronous operations have completedControl(~fcntl) set/get low-level object behaviorI/F OpenOpen provider extended interfaceswww.openfabrics.org19FabricDomainOpen a resource domainEndpointCreate a listening EP for connection-oriented protocolsEQ OpenOpen an event queue for listening EP or reporting fabric eventswww.openfabrics.org20Resource DomainQueryObtain domain specific attributesOpen AV, EQ, EP, SRQ, EQ GroupCreate an address vector, event or completion counter, event queue, endpoint, shared receive queue, or EQ groupMR OpsRegister data buffers for access by fabric resourceswww.openfabrics.org21Address VectorInsertInsert one or more addresses into the vectorRemoveRemote one or more addresses from the vectorLookupReturn a stored addressStraddrConvert an address into a printable stringwww.openfabrics.org22Base EPEnableEnables an active EP for data transfersCancelCancel a pending asynchronous operationGetopt(~getsockopt) get protocol specific EP optionsSetopt(~setsockopt) set protocol specific EP optionswww.openfabrics.org23Passive EPGetname(~getsockname) return EP addressListenStart listening for connection requestsRejectReject a connection requestwww.openfabrics.org24Active EPCMConnection establishment ops, usable by connection-oriented and connectionless endpointsMSG2-sided message queue ops, to send and receive messagesRMA1-sided RDMA read and write opsTagged2-sided matched message ops, to send and receive messages (conceptual merge of messages and RMA writes)Atomic1-sided atomic opsTriggeredDeferred operations initiated on a condition being metwww.openfabrics.org25Event QueueReadRetrieve a completion event, and optional source endpoint address data for received data transfersRead ErrRetrieve event data about an operation that completed with an unexpected errorWriteInsert an event into the queueResetDirects the EQ to signal its wait object when a specified condition is metStrerrorConverts error data associated with a completion into a printable stringwww.openfabrics.org26EQ GroupPollCheck EQs for eventsWaitWait for an event on the EQ groupwww.openfabrics.org27Completion CounterReadRetrieve a counters valueAddIncrement a counterSetSet / clear a counters valueWaitWait until a counter reaches a desired thresholdwww.openfabrics.org28Memory RegionDesc(~lkey) Optional local memory descriptor associated with a data bufferKey(~rkey) Protection key against access from remote data transfersArchitectural SemanticsProgressOrdering - completions and data deliveryMulti-threading and locking modelBufferingFunction signatures and semanticswww.openfabrics.org29Once defined, object and interface semantics cannot change semantic changes require new objects and interfacesNeed refiningProgressAbility of the underlying implementation to complete processing of an asynchronous requestNeed to consider ALL asynchronous requestsConnections, address resolution, data transfers, event processing, completions, etc.HW/SW mixwww.openfabrics.org30All(?) current solutions require significant software componentsProgressSupport two progress modelsAutomatic and implicitSeparate operations as belonging to one of two progress domainsData or controlReport progress model for each domainwww.openfabrics.org31SAMPLEImplicitAutomaticDataSoftwareHardware offloadControlSoftwareKernel servicesAutomatic ProgressImplies hardware offload modelOr standard kernel services / threads for control operationsOnce an operation is initiated, it will complete without further user intervention or calls into the APIAutomatic progress meets implicit model by definitionwww.openfabrics.org32Implicit ProgressImplies significant software componentOccurs when reading or waiting on EQ(s)Application can use separate EQs for control and dataProgress limited to objects associated with selected EQ(s)App can request automatic progressE.g. app wants to wait on native wait objectImplies provider allocated threadingwww.openfabrics.org33OrderingApplies to a single initiator endpoint performing data transfers to one target endpoint over the same data flowData flow may be a conceptual QoS level or path through the networkSeparate ordering domainsCompletions, message, dataFenced ordering may be obtained using fi_sync operationwww.openfabrics.org34Completion OrderingOrder in which operation completions are reported relative to their submissionUnordered or orderedNo defined requirement for ordered completionsDefault: unorderedwww.openfabrics.org35Message OrderingOrder in which message (transport) headers are processedI.e. whether transport message are received in or out of orderDetermined by selection of ordering bits[Read | Write | Send] After [Read | Write | Send]RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SASExample:fi_order = 0 // unorderedfi_order = RAR | RAW | RAS | WAW | WAS | SAW | SAS // IB/iWarp orderingwww.openfabrics.org36Data OrderingDelivery order of transport data into target memoryOrdering per byte-addressable locationI.e. access to the same byte in memoryOrdering constrained by message ordering rulesMust at least have message ordering firstwww.openfabrics.org37Data OrderingOrdering limited to message order sizeE.g. MTUIn order data delivery if transfer