Architectural Patterns for the Design of Federated

Architectural Patterns for the Design of Federated Learning SystemsSin Kit Loa,b,∗, Qinghua Lua,b, Liming Zhua,b, Hye-Young Paikb, Xiwei Xua,b and Chen WangaaData61, CSIRO, AustraliabUniversity of New South Wales, Australia

ART ICLE INFO

Keywords:Federated LearningPatternSoftware ArchitectureMachine LearningArtificial Intelligence

ABSTRACTFederated learning has received fast-growing interests from academia and industry to tackle the chal-lenges of data hungriness and privacy in machine learning. A federated learning system can be viewedas a large-scale distributed system with different components and stakeholders as numerous client de-vices participate in federated learning. Designing a federated learning system requires software systemdesign thinking apart from the machine learning knowledge. Although much effort has been put intofederated learning from the machine learning technique aspects, the software architecture design con-cerns in building federated learning systems have been largely ignored. Therefore, in this paper, wepresent a collection of architectural patterns to deal with the design challenges of federated learningsystems. Architectural patterns present reusable solutions to a commonly occurring problem within agiven context during software architecture design. The presented patterns are based on the results ofa systematic literature review and include three client management patterns, four model managementpatterns, three model training patterns, and four model aggregation patterns. The patterns are asso-ciated to the particular state transitions in a federated learning model lifecycle, serving as a guidancefor effective use of the patterns in the design of federated learning systems.

1. IntroductionFederated Learning OverviewThe ever-growing use of big data systems, industrial-scaleIoT platforms, and smart devices contribute to the exponen-tial growth in data dimensions [30]. This exponential growthof data has accelerated the adoption of machine learning inmany areas, such as natural language processing and com-puter vision. However, many machine learning systems suf-fer from insufficient training data. The reason is mainly dueto the increasing concerns on data privacy, e.g., restrictionson data sharing with external systems for machine learningpurposes. For instance, the General Data Protection Regu-lation (GDPR) [1] stipulate a range of data protection mea-sures, and data privacy is now one of the most important eth-ical principles expected of machine learning systems [22].Furthermore, raw data collected are unable to be used di-rectly for model training for most circumstances. The rawdata needs to be studied and pre-processed before being usedfor model training and data sharing restrictions increase thedifficulty to obtain training data. Moreover, concept drift [27]also occurs when new data is constantly collected, replacingthe outdated data. This makes the model trained on previousdata degrades at a much faster rate. Hence, a new techniquethat can swiftly produce models that adapt to the conceptdrift when different data is discovered in clients is essen-tial.To effectively address the lack of training data limitations,concept drift, and the data-sharing restriction while still en-

[email protected] (S.K. Lo); [email protected](Q. Lu); [email protected] (L. Zhu); [email protected] (H.Paik); [email protected] (X. Xu); [email protected] (C.Wang)

ORCID(s): 0000-0002-9156-3225 (S.K. Lo); 0000-0002-7783-5183 (Q.Lu); 0000-0003-4425-7388 (H. Paik); 0000-0002-2273-1862 (X. Xu);0000-0002-3119-4763 (C. Wang)

Localmodel

training

Modelaggregation

Localmodel

training

Localmodel

training

Task script Localmodel

Globalmodel Client System

owner

Clientdevice

Clientdevice

Clientdevice

CentralServer

Figure 1: Federated Learning Overview.

abling effective data inferences by the data-hungry machinelearning models, Google introduced the concept of feder-ated learning [33] in 2016. Fig. 1 illustrates the overviewof federated learning. There are three stakeholders in a fed-erated learning system: (1) learning coordinator (i.e., sys-tem owner), (2) contributor client - data contributor & localmodel trainer, and (3) user client - model user. Note that acontributor client can also be a user client. There are twotypes of system nodes (i.e., hardware components): (1) cen-tral server, (2) client device.A Motivation ExampleImagine we use federated learning to train the next-word pre-diction model in a mobile phone keyboard application. Thelearning coordinator is the provider of the keyboard appli-cation, while contributor clients are the mobile phone users.The user clients will be the new or existing mobile phoneusers of the keyboard application. The differences in own-ership, geolocation, and usage pattern cause the local data

SK Lo et al.: Preprint submitted to Elsevier Page 1 of 19

arX

iv:2

101.

0237

3v3

[cs

.LG

] 1

8 Ju

n 20

21

Architectural Patterns for the Design of Federated Learning Systems

to possess non-IID1 characteristics, which is a design chal-lenge of federated learning systems. The federated learningprocess starts when a training task is created by the learningcoordinator. For instance, the keyboard application providerproduces and embeds an initial global model into the key-board applications. The initial global model (includes taskscripts & training hyperparameters) is broadcast to the par-ticipating client devices. After receiving the initial model,the model training is performed locally across the client de-vices, without the centralised collection of raw client data.Here, the smartphones that have the keyboard applicationinstalled receive the model to be trained. The client devicestypically run on different operating systems and have diversecommunication and computation resources, which is definedas the system heterogeneity challenges.Each training round takes one step of gradient descent onthe current model using each client’s local data. In this case,the smartphones optimise the model using the keyboard typ-ing data. After each round, the model update is submit-ted by each participating client device to the central server.The central server collects all the model updates and per-forms model aggregation to form a new version of the globalmodel. The new global model is re-distributed to the clientdevices for the next training round. This entire process iter-ates until the global model converges. As a result, communi-cation and computation efficiency are crucial as many localcomputation and communication rounds are required for theglobal model to converge. Moreover, due to the limited re-sources available on each device, the device owners mightbe reluctant to participate in the federated learning process.Therefore, client motivatability becomes a design challenge.Furthermore, the central server communicates with multipledevices for the model exchange which makes it vulnerable tothe single-point-of-failure. The trustworthiness of the cen-tral server and the possibility of adversarial nodes partici-pating in the training process also creates system reliabilityand security challenges. After the completion of training,the learning coordinator stores the converged model and de-ploys it to the user clients. For instance, the keyboard ap-plication provider stores the converged model and embedsit to the latest version of the application for existing or newapplication users. Here, the different versions of the localmodels associated with the global model created need to bemaintained.Design ChallengesCompared to centralised machine learning, federated learn-ing is more advantageous from the data privacy perspec-tive and dealing with the lack of training data. However,a federated learning system, as a large-scale distributed sys-tem, presents more architectural design challenges [31], es-pecially when dealing with the interactions between the cen-tral server and client devices andmanaging trade-offs of soft-ware quality attributes. Themain design challenges are sum-

1Non-Identically and Independently Distribution: Highly-skewed andpersonalised data distribution that vary heavily between different clients andaffects the model performance and generalisation [37].

marised as follows.• Global models might have low accuracy, and lack gener-

ality when client devices generate non-IID data. Central-ising and randomising the data is the approach adoptedby conventional machine learning to deal with data het-erogeneity but the inherent privacy-preserving nature offederated learning render such techniques inappropriate.

• To generate high-quality global models that are adaptiveto concept drift, multiple rounds of communication arerequired to exchange local model updates, which couldincur high communication costs.

• Client devices have limited resources to perform the mul-tiple rounds of model training and communications re-quired by the system, which may affect the model quality.

• As numerous client devices participate in federated learn-ing, it is challenging to coordinate the learning processand ensure model provenance, system reliability and se-curity.

How to select appropriate designs to fulfill different softwarequality requirements and design constraints is non-trivial forsuch a complex distributed system. Although much efforthas been put into federated learning from the machine learn-ing techniques side, there is still a gap on the architectural de-sign considerations of the federated learning systems. A sys-tematic guidance on architecture design of federated learn-ing systems is required to better leverage the existing so-lutions and promote federated learning to enterprise-leveladoption.Research ContributionIn this paper, we present a collection of patterns for the de-sign of federated learning systems. In software engineering,an architectural pattern is a reusable solution to a problemthat occurs commonly within a given context in software de-sign [6]. Our pattern collection includes three client man-agement patterns, four model management patterns, threemodel training patterns, and fourmodel aggregation patterns.We define the lifecycle of a model in a federated learningsystem and associate each identified pattern to a particularstate transition in the lifecycle.The main contribution of this paper includes:• The collection of architectural patterns provides a design

solution pool for practitioners to select from for real-worldfederated learning system development.

• The federated learning model lifecycle with architecturalpattern annotations, which serves as a systematic guidefor practitioners during the design and development of afederated learning system.

Paper StructureThe remainder of the study is organised as follows. Section 2introduces the research methodology. Section 3 provides anoverview of the patterns in the federated learning lifecycle,



Initial search

Paperscreening

Paperscreening

Snowballingprocess

Qualityassessment

Dataextraction/synthesis

Reporting

Patterndefinition

Extra literaturereview

Real-worldapplication

review

SLR Process

Reporting

Figure 2: Pattern Collection Process.

Table 1

Sources of Patterns.

Category Pattern SLR papers ML/FL papers Real-world applications

Clientmanagementpatterns

Pattern 1: Client registry 0 2 3Pattern 2: Client selector 4 2 1Pattern 3: Client cluster 2 2 2

Modelmanagementpatterns

Pattern 4: Message compressor 8 6 0Pattern 5: Model co-versioning registry 0 0 4Pattern 6: Model replacement trigger 0 1 3Pattern 7: Deployment selector 0 0 3

Model trainingpatterns

Pattern 8: Multi-task model trainer 2 1 3Pattern 9: Heterogeneous data handler 1 2 0Pattern 10: Incentive registry 18 1 0

Modelaggregationpatterns

Pattern 11: Asynchronous aggregator 4 1 0Pattern 12: Decentralised aggregator 5 2 0Pattern 13: Hierarchical aggregator 4 2 0Pattern 14: Secure aggregator 31 0 3

followed by the detailed discussions on each pattern. Sec-tion 4 summarises and discusses some repeating challengesof federated learning systems and Section 5 presents the re-lated work. Finally, Section 6 concludes the paper.

2. MethodologyFig. 2 illustrates the federated learning pattern extraction andcollection process. Firstly, the patterns are collected basedon the results of our previous systematic literature review(SLR) on federated learning [31]. SLR and situationalmethodengineering (SME) [9] are some of the renowned systematicmethodologies for derivation of pattern languages. For in-stance, several pattern derivations on cloud migration andsoftware architecture have used SLR (e.g., Zdun et al. [48],Aakash Ahmad et al. [2], and Jamshidi et al. [18]). More-over, Balalaie et al. [4] have derived the pattern languagesin the context of cloud-native and microservices using situ-ational method engineering.For this work, we have adopted the SLR method as the cur-rently available materials and research works on federatedlearning are still highly academic-based. Secondly, we in-tend to propose design patterns for software architectural de-sign aspects of building federated learning systems ratherthan for their development/engineering processes. This isbecause, during the SLR work, we have identified many ar-

chitectural design challenges and lack of systematic designapproaches to federated learning. Furthermore, while SMEhas the benefit of offering a systematic methodology for se-lecting appropriate method components from a repository ofreusable method components, it is more suitable for patternextraction of an information system development (ISD) pro-cess [35].The SLR was performed according to Kitchenham’s SLRguideline [23], and the number of final studied papers is231. During the SLR, we developed a comprehensive map-ping between federated learning challenges and approaches.Additionally, we reviewed 22 machine learning and feder-ated learning papers published after the SLR search cut-offdate (31st Jan 2020) and 22 real-world applications. Theadditional literature review on machine learning and feder-ated learning, and the review of the real-world applicationsare conducted based on our past real-world project imple-mentation experience. Table 1 shows a mapping betweeneach pattern with its respective number of source papers andreal-world applications. Twelve patterns were initially col-lected from SLR or additional literature review, whereas theremaining two patterns were identified through real-worldapplications.We discussed the proposed patterns according to the patternform presented in [34]. The form comprehensively describes



Converged

Notconverged

Clientdevice

Statetransition PatternCentral

server

Transmitted

Performancedegrades

Pattern 9:Heterogeneous

data handler

Pattern 8:Multi-task

model trainer

Pattern 6: Modelreplacement

trigger

Pattern 5:Model co-versioning

registry

Pattern 10:Inventiveregistry

Pattern 7:Deployment

selector

Pattern 1:Client registry

Pattern 2:Client selector

Pattern 3: Client cluster

AggregatedEvaluated

TrainedBroadcastInitiated

DeployedMonitored

Pattern 14:Secure

aggregator

Pattern 13:Hierarchicalaggregator

Pattern 12:Decentralised

aggregator

Pattern 11:Asynchronous

aggregator

Pattern 4:Message

compressor

Abandoned

Patternenabler

Figure 3: A Model’s Lifecycle in Federated Learning.

the patterns by discussing the Context, Problem, Forces,Solution, Consequences, Related patterns, and Known-uses of the pattern.The Context is the description of the situation where a prob-lem occurs, in which the solution proposed is applicable, orthe problem is solvable by the pattern. Problem comprehen-sively elicits the challenges and limitations that occur underthe defined context. Forces describe the reasons and causesfor a specific design or pattern decision to be made to re-solve the problem. Solution describes how the problem andthe conflict of forces can be resolved by a specific pattern.Consequences reason about the impact of applying a solu-tion, specifically on the contradictions among the benefits,costs, drawbacks, tradeoffs, and liabilities of the solution.Related patterns record the other patterns from this paperthat are related to the current pattern. Known-uses refer toempirical evidence that the solution has been used in the realworld.

3. Federated Learning PatternsFig. 3 illustrates the lifecycle of a model in a federated learn-ing system. The lifecycle covers the deployment of the com-pletely trained global model to the client devices (i.e., modelusers) for data inference. The deployment process involvesthe communication between the central server and the clientdevices. We categorise the federated learning patterns asshown in Table 2 to provide an overview. There are fourmain groups: (1) clientmanagement patterns, (2)modelman-agement patterns, (3) model training patterns, and (4) modelaggregation patterns.3.1. Client Management PatternsClient management patterns describe the patterns that man-age the client devices’ information and their interaction with

the central server. A client registrymanages the informationof all the participating client devices. Client selector selectsclient devices for a specific training task, while client clus-ter increases the model performance and training efficiencythrough grouping client devices based on the similarity ofcertain characteristics (e.g., available resources, data distri-bution).3.1.1. Pattern 1: Client Registry

Clientregistry

CentralServer

Clientdevice

information

Clientdevice

Clientdevice

Clientdevice

Figure 4: Client Registry.

Summary: A client registry maintains the information ofall the participating client devices for client management.According to Fig. 4, the client registry is maintained in thecentral server. The central server sends the request for in-formation to the client devices. The client devices then sendthe requested information together with the first local modelupdates.Context: Client management is centralised, and global andlocal models are exchanged between the central server the



Table 2Overview of architectural patterns for federated learning

Category Name Summary

Client managementpatterns

Client registry Maintains the information of all the participating client devices for client management.

Client selector Actively selects the client devices for a certain round of training according to the predefinedcriteria to increase model performance and system efficiency.

Client cluster Groups the client devices (i.e., model trainers) based on their similarity of certain char-acteristics (e.g., available resources, data distribution, features, geolocation) to increasethe model performance and training efficiency.

Model managementpatterns

Message compressor Compresses and reduces the message data size before every round of model exchange toincrease the communication efficiency.

Model co-versioning registry Stores and aligns the local models from each client with the corresponding global modelversions for model provenance and model performance tracking.

Model replacement trigger Triggers model replacement when the degradation in model performance is detected.

Deployment selector Selects and matches the converged global models to suitable client devices to maximisethe global models’ performance for different applications and tasks.

Model trainingpatterns

Multi-task model trainer Utilises data from separate but related models on local client devices to improve learningefficiency and model performance.

Heterogeneous data handler Solves the non-IID and skewed data distribution issues through data volume and dataclass addition while maintaining the local data privacy.

Incentive registry Measures and records the performance and contributions of each client and providesincentives to motivate clients’ participation.

Model aggregationpatterns

Asynchronous aggregator Performs aggregation asynchronously whenever a model update arrives without waitingfor all the model updates every round to reduce aggregation latency.

Decentralised aggregator Removes the central server from the system and decentralizes its role to prevent single-point-of-failure and increase system reliability.

Hierarchical aggregator Adds an edge layer to perform partial aggregation of local models from closely-relatedclient devices to improve model quality and system efficiency.

Secure aggregator The adoption of secure multiparty computation (MPC) protocols that manages the modelexchange and aggregation security to protect model security.

massive number of distributed client devices with dynamicconnectivity and diverse resources.Problem: It is challenging for a federated learning system totrack any dishonest, failed, or dropout node. This is crucialto secure the central server and client devices from adversar-ial threats. Moreover, to effectively align the model trainingprocess of each client device for each aggregation round, arecord of the connection and training information of eachclient device that has interacted with the central server is re-quired.Forces: The problem requires to balance the following forces:• Updatability. The ability to keep track of the participating

devices is necessary to ensure the information recorded isup-to-date.

• Data privacy. The records of client information exposethe clients to data privacy issues. For instance, the deviceusage pattern of users may be inferred from the deviceconnection up-time, device information, resources, etc.

Solution: A client registry records all the information ofclient devices that are connected to the system from the firsttime. The information includes device ID, connection up &downtime, device resource information (computation, com-munication, power & storage). The access to the client reg-istry could be restricted according to the agreement betweenthe central server and participating client devices.

Consequences:

Benefits:• Maintainability. The client registry enables the system to

effectively manage the dynamically connecting and dis-connecting clients.

• Reliability. The client registry provides status tracking forall the devices, which is essential for problematic nodeidentification.

Drawbacks:• Data privacy. The recording of the device information on

the central server leads to client data privacy issues.• Cost. The maintenance of client device information re-

quires extra communication cost and storage cost, whichfurther surgeswhen the number of client devices increases.

Related patterns: Model Co-Versioning Registry, ClientSelector, Client Cluster, Asynchronous Aggregator, Hierar-chical Aggregator

Known uses:

• IBM Federated Learning2: Party Stack component man-ages the client parties of IBM federated learning frame-work, that contains sub-components such as protocol han-dler, connection, model, local training, and data handler2https://github.com/IBM/federated-learning-lib


https://github.com/IBM/federated-learning-lib


for client devices registration and management.• doc.ai3: Client registry is designed for medical research

applications to ensure that updates received apply to acurrent version of the global model, and not a deprecatedglobal model.

• SIEMENS Mindsphere Asset Manager 4:To support thecollaboration of federated learning clients in industrial IoTenvironment, Industrial Metadata Management is intro-duced as a device metadata and asset data manager.

3.1.2. Pattern 2: Client SelectorSelected

client

Excludedclient

ClientSelector

CentralServer

Client deviceinformation

Selectedclient

Figure 5: Client Selector.

Summary: A client selector actively selects the client de-vices for a certain round of training according to the pre-defined criteria to increase model performance and systemefficiency. As shown in Fig. 5, the central server assessesthe performance of each client according to the informationreceived. Based on the assessment results, the second clientis excluded from receiving the global model.Context: Multiple rounds ofmodel exchanges are performedand communication cost becomes a bottleneck. Furthermore,multiple iterations of aggregations are performed and con-sume high computation resources.Problem: The central server is burdensome to accommo-date the communication with massive number of widely-distributed client devices every round.Forces: The problem requires the following forces to be bal-anced:• Latency. Client devices have system heterogeneity (differ-

ence in computation, communication, & energy resources)that affect the local model training and global model ag-gregation time.

• Model quality. Local data are statistically heterogeneous(different data distribution/quality) which produce localmodels that overfit the local data.

Solution: Selecting client devices with predefined criteriacan optimise the formation of the global model. The client

3https://doc.ai/4https://documentation.mindsphere.io/resources/html/asset-manager

/en-US/index.html

selector on the central server performs client selection ev-ery round to include the best fitting client devices for globalmodel aggregation. The selection criteria can be configuredas follows:• Resource-based: The central server assesses the resources

available on each client devices every training round andselects the client devices with the satisfied resource status(e.g., WiFi connection, pending status, sleep time)

• Data-based: The central server examines the informationof the data collected by each client, specifically on thenumber of data classes, distribution of data sample vol-ume per class, and data quality. Based on these assess-ments, the model training process includes devices withhigh-quality data, higher data volume per class, and ex-cludes the devices with low-quality data, or data that arehighly heterogeneous in comparison with other devices.

• Performance-based: Performance-based client selectioncan be conducted through local model performance as-sessment (e.g., performance of the latest local model orthe historical records of local model performance).

Consequences:

Benefits:• Resource optimisation. The client selection optimises the

resource usage of the central server to compute and com-municate with suitable client devices for each aggregationround, without wasting resources to aggregate the low-quality models.

• System performance. Selecting clients with sufficient powerand network bandwidth greatly reduces the chances of clientsdropping out and lowers the communication latency.

• Model performance. Selecting clients with the higher lo-cal model performance or lower data heterogeneity in-creases the global model quality.

Drawbacks:• Model generality. The exclusion of models from certain

client devices may lead to the missing of essential datafeatures and the loss of the global model generality.

• Data privacy. The central server needs to acquire the sys-tem and resource information (processor’s capacity, net-work availability, bandwidth, online status, etc.) everyround to perform devices ranking and selection. Accessto client devices’ information creates data privacy issues.

• Computation cost. Extra resources are spent on transfer-ring of the required information for selection decision-making.

Related patterns: Client Registry, Deployment Selector

Known uses:

• Google’s FedAvg: FedAvg [33] algorithm includes clientselection that randomly selects a subset of clients for each


https://doc.ai/

https://documentation.mindsphere.io/resources/html/asset-manager/en-US/index.html

https://documentation.mindsphere.io/resources/html/asset-manager/en-US/index.html


round based on predefined environment conditions anddevice specification of the client devices.

• In IBM’s Helios [45], there is a training consumption pro-filing function that fully profiles the resource consumptionfor model training on client devices. Based on that pro-filing, a resource-aware soft-training scheme is designedto accelerate local model training on heterogeneous de-vices and prevent stragglers from delaying the collabora-tive learning process.

• FedCS suggested by OMRON SINIC X Corporation5 setsa certain deadline for clients to upload the model updates.

• Communication-Mitigated Federated Learning (CMFL)[40] excludes the irrelevant local updates by identifyingthe relevance of a client update by comparing its globaltendency of model updating with all the other clients.

• CDW_FedAvg [49] takes the centroid distance betweenthe positive and negative classes of each client dataset intoaccount for aggregation.

3.1.3. Pattern 3: Client Cluster

Client devicecluster B

Client devicecluster A

CentralServer

Figure 6: Client Cluster.

Summary: A client cluster groups the client devices (i.e.,model trainers) based on their similarity of certain charac-teristics (e.g., available resources, data distribution, features,geolocation) to increase the model performance and train-ing efficiency. In Fig. 6, the client devices are clustered into2 groups by the central server, and the central server willbroadcast the global model that is more related to the clus-ters accordingly.Context: The system trainsmodels over client deviceswhichhave diverse communication and computation resources, re-sulted in statistical and system heterogeneity challenges.Problem: Federated learning models generated under non-IID data properties are deemed to be less generalised. Thisis due to the lack of significantly representative data labelsfrom the client devices. Furthermore, local models may driftsignificantly from each other.Forces: The problem requires to balance the following forces:

5https://www.omron.com/sinicx/

• Computation cost and training time. More computationcosts and longer training time are required to overcomethe non-IID issue of client devices.

• Data privacy. Data privacy contradicts with the Access tothe entire or parts of the client’s raw data is needed by thelearning coordinator to resolve the non-IID issue whichcreates data privacy risks.

Solution: Client devices are clustered into different groupsaccording to their properties (e.g., data distribution, featuressimilarities, gradient loss). By creating clusters of clientswith similar data patterns, the global model generated willhave better performance for the non-IID-severe client net-work, without accessing the local data.Consequences:

Benefits:• Model quality. The global model created by client clusters

can have a higher model performance for highly person-alised prediction tasks.

• Convergence speed. The consequent deployed globalmod-els can have faster convergence speed as the models ofthe same cluster can identify the gradient’s minima muchfaster when the clients’ data distribution and IIDness aresimilar.

Drawbacks:• Computation cost. The central server consumes extra com-

putation cost and time for client clustering and relation-ship quantification.

• Data privacy. The learning coordinator (i.e., central server)requires extra client device information (e.g., data distri-bution, feature similarities, gradient loss) to perform clus-tering. This exposes the client devices to the possible riskof private data leakage.

Related patterns: Client Registry, Client Selector, Deploy-ment Selector

Known uses:

• Iterative Federated Clustering Algorithm (IFCA)6 is aframework introduced by UC Berkley and Google to clus-ter client devices based on the loss values of the client’sgradient.

• Clustered Federated Learning (CFL)7 uses a cosine similarity-based clustering method that creates a bi-partitioningto group client devices with the same data generating dis-tribution into the same cluster.

• TiFL [10] is a tier-based federated learning system thatadaptively groups client devices with similar training timeper round to mitigate the heterogeneity problem withoutaffecting the model accuracy.6https://github.com/jichan3751/ifca7https://github.com/felisat/clustered-federated-learning


https://www.omron.com/sinicx/

https://github.com/jichan3751/ifca

https://github.com/felisat/clustered-federated-learning


• Patient clustering in a federated learning system is im-plemented by Massachusetts General Hospital to improveefficiency in predicting mortality and hospital stay time[17].

3.2. Model Management PatternsModel management patterns include patterns that handle mo-del transmission, deployment, and governance. A messagecompressor reduces the transmitted message size. A modelco-versioning registry records all local model versions fromeach client and aligns them with their corresponding globalmodel. A model replacement trigger initiates a new modeltraining taskwhen the converged globalmodel’s performancedegrades. A deployment selector deploys the global modelto the selected clients to improve the model quality for per-sonalised tasks.3.2.1. Pattern 4: Message Compressor

MessageCompressor

MessageCompressor

CentralServerClient

device

Figure 7: Message Compressor.

Summary: Amessage compressor reduces themessage datasize before every round of model exchange to increase thecommunication efficiency. Fig. 7 illustrates the operation ofthe pattern on both ends of the system (client device and cen-tral server).Context: Multiple rounds of model exchanges occurs be-tween a central server and many client devices to completethe model training.Problem: Communication cost for model communication(e.g., transferring model parameters or gradients) is often acritical bottleneck when the system scales up, especially forbandwidth-limited client devices.Forces: The problem requires to balance the following forces:• Computation cost. High computation costs are required

by the central server to aggregate all the bulky model pa-rameters collected every round.

• Communication cost. Communication costs are scarceto communicate the model parameters and gradients be-tween resource-limited client devices and the central server.

Solution: The model parameters and the training task scriptas one message package is compressed before being trans-ferred between the central server and client devices.Consequences:

Benefits:• Communication efficiency. The compression of model pa-

rameters reduces the communication cost and network

throughput for model exchanges.Drawbacks:• Computation cost. Extra computation is required for mes-

sage compression and decompression every round.• Loss of information. The downsizing of the model param-

eters might cause the loss of essential information.Related patterns: Client Registry, Model Co-VersioningRegistry

Known uses:

• Google Sketched Update [24]: Google proposes two com-munication efficient update approaches: structured updateand sketched update. Structured update directly learns anupdate from a restricted space that can be parametrisedusing a smaller number of variables, whereas sketched up-date compresses the model before sending it to the centralserver.

• IBM PruneFL [21] adaptively prunes the distributed pa-rameters of the models, including initial pruning at a se-lected client and further pruning as part of the federatedlearning process.

• FedCom [15] compresses messages for uplink communi-cation from the client device to the central server. Thecentral server produces a convex combination of the pre-vious global model and the average of updated local mod-els to retain the essential information of the compressedmodel parameters.

3.2.2. Pattern 5: Model Co-versioning Registry

Newer version

Model co-versioning

registry

CentralServer1

2

3

1

2 3

1

32

Clientdevice

Clientdevice

Clientdevice

Figure 8: Model Co-versioning Registry.

Summary: Amodel co-versioning registry records all localmodel versions from each client and aligns them with theircorresponding global model. This enables the tracing ofmodel quality and adversarial client devices at any specificpoint of the training to improve system accountability. Fig. 8shows that the registry collects and maps the local model up-dates to the associated global model versions.Context: Multiple new versions of local models are gener-ated from different client devices and one global model ag-gregated each round. For instance, a federated learning task



that runs for 100 rounds on 100 devices will create 10,000local models and 100 global models in total.Problem: With high numbers of local models created eachround, it is challenging to keep track of which local mod-els contributed to the global model of a specific round. Fur-thermore, the system needs to handle the challenges of asyn-chronous updates, client dropouts, model selection, etc.Forces: The problem requires to balance the following forces:• Updatability. The system needs to keep track of the local

and globalmodels concerning each client device’s updates(application’s version or device OS/firmware updates) andensure that the information recorded is up-to-date.

• Immutability. The records and storage of the models co-versions and client IDs needs to be immutable.

Solution: A model co-versioning registry records the localmodel version of each client device and the global modelit corresponds to. This enables seamless synchronous andasynchronous model updates and aggregation. Furthermore,the model co-versioning registry enables the early-stoppingof complexmodel training (stop trainingwhen the localmodeloverfits and retrieve the best performing model previously).This can be done by observing the performance of the ag-gregated global model. Moreover, to provide accountablemodel provenance and co-versioning, blockchain is consid-ered as one alternative to the central server due to immutabil-ity and decentralisation properties.Consequences:

Benefits:• Model quality. The mapping of local models with their

corresponding version of the globalmodel allows the studyof the effect of each local model quality on the globalmodel.

• Accountability. System accountability improves as stake-holders can trace the local models that correspond to thecurrent or previous global model versions.

• System security. It enables the detection of adversarial ordishonest clients and tracks the local models that poisonsthe global model or causes system failure.

Drawbacks:• Storage cost. Extra storage cost is incurred to store all the

local and global models and their mutual relationships.The record also needs to be easily retrievable and it is chal-lenging if the central server host more task.

Related patterns: Client Registry

Known uses:

• DVC8 is an online machine learning version control plat-form built to make models shareable and reproducible.8https://dvc.org/

• MLflow Model Registry9 on Databricks is a centralizedmodel store that provides chronological model lineage,model versioning, stage transitions, and descriptions.

• Replicate.ai10 is an open-source version control platformfor machine learning that automatically tracks code, hy-perparameters, training data, weights, metrics, and systemdependencies.

• Pachyderm11 is an online machine learning pipeline plat-form that uses containers to execute the different stepsof the pipeline and also solves the data versioning prove-nance issues.

3.2.3. Pattern 6: Model Replacement Trigger

Trigger newmodel

productionwhen

performancedegrades

Model replacement trigger

CentralServer

Client devices

Figure 9: Model Replacement Trigger.

Summary: Fig. 9 illustrates a model replacement triggerthat initiates a new model training task when the currentglobal model’s performance drops below the threshold valueorwhen a degrade onmodel prediction accuracy is detected.Context: The client devices use converged global modelsfor inference or prediction.Problem: As new data is introduced to the system, the globalmodel accuracymight reduce gradually. Eventually, with thedegrading performance, the model is no longer be suitablefor the application.Forces: The problem requires to balance the following forces:• Model quality. The global model deployed might expe-

rience a performance drop when new data are used forinference and prediction.

• Performance degradation detection. The system needsto effectively determine the reason for the global model’sperformance degradation before deciding whether to acti-vate a new global model generation.

Solution: Amodel replacement trigger initiates a newmodeltraining task when the acting global model’s performance

9https://docs.databricks.com/applications/mlflow/model-registry.html

10https://replicate.ai/11https://www.pachyderm.com/


https://dvc.org/

https://docs.databricks.com/applications/mlflow/model-registry.html

https://docs.databricks.com/applications/mlflow/model-registry.html

https://replicate.ai/

https://www.pachyderm.com/


drops below the threshold value. It will compare the per-formance of the deployed global model on a certain numberof client devices to determine if the degradation is a globalevent. When the global model performance is lower thanthe preset threshold value for more than a fixed number ofconsecutive times, given that performance degradation is aglobal event, a new model training task is triggered.Consequences:

Benefits:• Updatability. The consistent updatability of the global

model helps to maintain system performance and reducesthe non-IID data effect. It is especially effective for clientsthat generate highly personalised data that causes the ef-fectiveness of the global model to reduce much faster asnew data is generated.

• Model quality. The ongoing model performance monitor-ing is effective to maintain the high quality of the globalmodel used by the clients.

Drawbacks:• Computation cost. The client devices will need to perform

model evaluation periodically that imposes extra compu-tational costs.

• Communication cost. The sharing of the evaluation re-sults among clients to know if performance degradationis a global event is communication costly.

Related patterns: Client Registry, Client Selector, ModelCo-versioning Registry

Known uses:

• Microsoft Azure Machine Learning Designer12 providesa platform for machine learning pipeline creation that en-ables models to be retrained on new data.

• Amazon SageMaker13 providesmodel deployment andmon-itoring services to maintain the accuracy of the deployedmodels.

• Alibaba Machine Learning Platform14 provides end-to-end machine learning services, including data process-ing, feature engineering, model training, model predic-tion, and model evaluation.

3.2.4. Pattern 7: Deployment SelectorSummary: A deployment selector deploys the convergedglobal model to the selected model users to improve the pre-diction quality for different applications and tasks. As shownin Fig. 10, different versions of converged models are de-ployed to different groups of clients after evaluation.Context: Client devices train local models using multitaskfederated learning settings (a model is trained using data

12https://azure.microsoft.com/en-au/services/machine-learning/designer/

13https://aws.amazon.com/sagemaker/14https://www.alibabacloud.com/product/machine-learning

Client devicegroup 1



Deployment selector

CentralServer

1

1 2 3

32

Figure 10: Deployment Selector.

from multiple applications to perform similar and relatedtasks). These models need to be deployed to suitable groupsof client devices.Problem: Due to the inherent diversity and non-IID distri-bution among local data, the globally trained model may notbe accurate enough for all clients or tasks.Forces: The problem requires to balance the following forces:• Identification of clients. The central server needs to match

and deploy the global models to the different groups ofclient devices.

• Training and storage of different models. The central serverneeds to train and store different global models for diverseclients or applications.

Solution: Adeployment selector examines and selects clients(i.e., model users) to receive the trained global model spec-ified for them based on their data characteristics or appli-cations. The deployment selector deploys the model to theclient devices once the globalmodel is completely trained.Consequences:

Benefits:• Model performance. Deploying converged global models

to suitable groups of client devices enhances the modelperformance to the specific groups of clients or applica-tions.

Drawbacks:• Cost. There are extra costs for training of multiple per-

sonalised global models, deployment selection, storage ofmultiple global models.

• Model performance. The statistical heterogeneity ofmodeltrainers produces personalised local models, which is thengeneralised through FedAvg aggregation. We need to con-sider the performance trade-off of the generalised globalmodel deployed for differentmodel users and applications.


https://azure.microsoft.com/en-au/services/machine-learning/designer/

https://azure.microsoft.com/en-au/services/machine-learning/designer/

https://aws.amazon.com/sagemaker/

https://www.alibabacloud.com/product/machine-learning


• Data privacy. Data privacy challenges exist when the cen-tral server collects the client information to identify suit-able models for different clients. Moreover, the globalmodel might be deployed to model users that have neverjoined the model training process.

Related patterns: Client Registry, Client Selector

Known uses:

• AzureMachine Learning15 supportsmass deploymentwitha step of compute target selection.

• Amazon SageMaker16 can hostmultiplemodels withmulti-model endpoints.

• Google Cloud17 uses model resources to manage differentversions of models.

3.3. Model Training PatternsPatterns about the model training and data preprocessing aregroup together as model training patterns, including multi-task model trainer that tackles non-IID data characteristics,heterogeneous data handler that deals with data heterogene-ity in training datasets, and incentive registry that increasesthe client’s motivatability through rewards.3.3.1. Pattern 8: Multi-Task Model Trainer

CentralServer

Clientdevice

Clientdevice

Figure 11: Multi-Task Model Trainer.

Summary: In federated learning, a multi-task model trainertrains separated but related models on local client devicesto improve learning efficiency and model performance. Asshown in Fig. 11, there are two groups of applications: (i)text-related applications (e.g., messaging, email, etc.), and(ii) image-related applications (camera, video, etc.). The re-lated models are trained on client devices using data of re-lated tasks.Context: Local data has statistical heterogeneity propertywhere the data distribution among different clients is skewedand a global model cannot capture the data pattern of eachclient.

15https://docs.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment

16https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

17https://cloud.google.com/ai-platform/prediction/docs/deploying-models

Problem: Federated learning models trained with non-IIDdata suffer from low accuracy and are less generalised tothe entire dataset. Furthermore, the local data that is highlypersonalised to the device users’ usage pattern creates localmodels that diverge in different directions. Hence, the globalmodel may have relatively low averaged accuracy.Forces: The problem requires to balance the following forces:• Computation cost. The complex model that solves the

non-IID issue consumes more computation and energy re-sources every round compared to general federated modeltraining. It also takes longer to compute all the trainingresults from the different tasks before submitting them tothe central server.

• Data privacy. To address the non-IID issue, more infor-mation from the local data needs to be explored to un-derstand the data distribution pattern. This ultimately ex-poses client devices to local data privacy threats.

Solution: The multi-task model trainer performs similar-but-related machine learning tasks on client devices. Thisenables the local model to learn from more local data thatfit naturally to the related local models for different tasks.For instance, a multi-task model for the next-word predic-tion task is trained using the on-device text messages, webbrowser search strings, and emails with similar mobile key-board usage patterns. MOCHA [38] is a state-of-the-art fed-erated multi-task learning algorithm that realises distributedmulti-task learning on federated settings.Consequences:

Benefits:• Model quality. Multi-task learning improves the model

performance by considering local data and loss in opti-mization and obtaining a local weight matrix through thisprocess. The local model fits for non-IID data in eachnode better than a global model.

Drawbacks:• Model quality. Multi-task training often works only with

convex loss functions and performs weak on non-convexloss functions.

• Model portability. As each client has a different model,the model’s portability is a problem that makes it hard toapply multi-task training on cross-device FL.

Related patterns: Client Registry, Model Co-versioning Reg-istry, Client Cluster, Deployment Selector

Known uses:

• MultiModel18 is a neural network architecture by Googlethat simultaneously solves several problems spanningmul-tiple domains, including image recognition, translation,and speech recognition.18https://ai.googleblog.com/2017/06/multimodel-multi-task-machine-

learning.html


https://docs.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment

https://docs.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment

https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

https://cloud.google.com/ai-platform/prediction/docs/deploying-models

https://cloud.google.com/ai-platform/prediction/docs/deploying-models

https://ai.googleblog.com/2017/06/multimodel-multi-task-machine-learning.html

https://ai.googleblog.com/2017/06/multimodel-multi-task-machine-learning.html


• MT-DNN19 is an open-source natural language understand-ing toolkit by Microsoft to train customized deep learningmodels.

• Yahoo Multi-Task Learning for Web Ranking is a multi-task learning framework developed by Yahoo! Labs torank in web search.

• VIRTUAL [12] is an algorithm for federated multi-tasklearning with non-convex models. The server and devicesare treated as a star-shaped bayesian network, and modellearning is performed on the network using approximatedvariational inference.

3.3.2. Pattern 9: Heterogeneous Data Handler

Heterogenousdata handler

CentralServer

Heterogenousdata handler

Clientdevice

Figure 12: Heterogeneous Data Handler.

Summary: Heterogeneous data handler solves the non-IIDand skewed data distribution issues through data volume anddata class addition (e.g., data augmentation or generative ad-versarial network) while maintaining the local data privacy.The pattern is illustrated in Fig. 12, where the heterogeneousdata handler operates at both ends of the system.Context: Client devices possess heterogeneous data charac-teristics due to the highly personalized data generation pat-tern. Furthermore, the raw local data cannot be shared so thedata balancing task becomes extremely challenging.Problem: The imbalanced and skewed data distribution ofclient devices produces local models that are not generalisedto the entire client network. The aggregation of these localmodels reduces global model accuracy.Forces: The problem requires the following forces to be bal-anced:• Data efficiency. It is challenging to articulate the suitable

data volume and classes to be augmented to solve dataheterogeneity on local client devices.

• Data accessibility. The heterogeneous data issue that ex-ists within the client device can be solved by collectingall the data under a centralized location. However, thisviolates the data privacy of client devices.

Solution: A heterogeneous data handler balances the datadistribution and solves the data heterogeneity issue in theclient devices through data augmentation and federated dis-tillation. Data augmentation solves data heterogeneity bygenerating augmented data locally until the data volume isthe same across all client devices. Furthermore, the classes

19https://github.com/microsoft/MT-DNN

in the datasets are also populated equally across all client de-vices. Federated distillation enables the client devices to ob-tain knowledge from other devices periodically without di-rectly accessing the data of other client devices. Other meth-ods includes taking the quantified data heterogeneity weigh-tage (e.g, Pearson’s correlation, centroid averaging-distance,etc.) into account for model aggregation.Consequences:

Benefits:• Model quality. By solving the non-IID issue of local datasets,

the performance and generality of the global model areimproved.

Drawbacks:• Computation cost. It is computationally costly to deal

with data heterogeneity together with the localmodel train-ing.

Related patterns: Client Registry, Client Selector, ClientCluster

Known uses:

• Astreae [13] is a self-balancing federated learning frame-work that alleviates the imbalances by performing globaldata distribution-based data augmentation.

• Federated Augmentation (FAug) [19] is a data augmenta-tion scheme that utilises a generative adversarial network(GAN) which is collectively trained under the trade-offbetween privacy leakage and communication overhead.

• Federated Distillation (FD) [3] is a method that adoptedknowledge distillation approaches to tackle the non-IIDissue by obtaining the knowledge from other devices dur-ing the distributed training process, without accessing theraw data.

3.3.3. Pattern 10: Incentive Registry

Smartcontract

Incentives

Blockchain

Figure 13: Incentive Registry.

Summary: An incentive registrymaintains the list of partic-ipating clients and their rewards that correspond to clients’contributions (e.g., data volume, model performance, com-putation resources, etc.) to motivate clients’ participation.Fig. 13 illustrates a blockchain & smart contract-based in-centive mechanism.


https://github.com/microsoft/MT-DNN


Context: The model training participation rate of client de-vices is low while the high participation rate is crucial forthe global model performance.Problem: Although the system relies greatly on the par-ticipation of clients to produce high-quality global models,clients are not mandatory to join the model training and con-tribute their data/resources.Forces: The problem requires to balance the following forces:• Incentive scheme. It is challenging to formulate the form

of rewards to attract different clients with different partici-pation motives. Furthermore, the incentive scheme needsto be agreed upon by both the learning coordinator and theclients, e.g., performance-based, data-contribution-based,resource-contribution-based, and provision-frequency-based.

• Data privacy. To identify the contribution of each clientdevice, the local data and client information is requiredto be studied and analysed by the central server. This ex-poses the client devices’ local data to privacy threats.

Solution: An incentive registry records all client’s contri-butions and incentives based on the rate agreed. There arevarious ways to formulate the incentive scheme, e.g., deepreinforcement learning, blockchain/smart contracts, and theStackelberg game model.Consequences:

Benefits:• Client motivatability. The incentive mechanism is effec-

tive in attracting clients to contribute data and resourcesto the training process.

Drawbacks:• System security. There might be dishonest clients that

submit fraudulent results to earn rewards illegally and harmthe training process.

Related patterns: Client Registry, Client Selector

Known uses:

• FLChain [5] is a federated learning blockchain providingan incentive scheme for collaborative training and a mar-ket place for model trading.

• DeepChain [42] is a blockchain-based collaborative train-ing framework with an incentive mechanism that encour-ages parties to participate in the deep learningmodel train-ing and share the obtained local gradients.

• FedCoin [29] is a blockchain-based peer-to-peer paymentsystem for federated learning to enable ShapleyValue (SV)based reward distribution.

3.4. Model Aggregation PatternsModel aggregation patterns are design solutions of modelaggregation used for different purposes. Asynchronous ag-gregator aims to reduce aggregation latency and increase

system efficiency, whereas decentralised aggregator targetsto increase system reliability and accountability. Hierarchi-cal aggregator is adopted to improve model quality and op-timises resources. Secure aggregator is designed to protectthe models’ security.3.4.1. Pattern 11: Asynchronous Aggregator

1staggregation

2ndaggregation

Nth aggregation

...

Clientdevice

Clientdevice

Clientdevice

Figure 14: Asynchronous Aggregator.

Summary: To increase the global model aggregation speedevery round, the central server can perform model aggrega-tion asynchronously whenever a model update arrives with-out waiting for all the model updates every round. In Fig. 14,the asynchronous aggregator pattern is demonstrated as thefirst client device asynchronously uploads its local modelduring the second aggregation round while skipping the firstaggregation round.Context: In conventional federated learning, the central serverreceives all local model updates from the distributed clientdevices synchronously and performs model aggregation ev-ery round. The central server needs to wait for every modelto arrive before performing the model aggregation for thatround. Hence, the aggregation time depends on the lastmodelupdate that reaches the central server.Problem: Due to the difference in computation resources,the model training lead time is different per device. Further-more, the difference in bandwidth availability, communica-tion efficiency affects the model’s transfer rate. Therefore,the delay in model training and transfer increases the latencyin global model aggregation.Forces: The problem requires the following forces to be bal-anced:• Model quality. There will be possible bias in the global

model if not all local model updates are aggregated in ev-ery iteration as important information or features mightbe left out.

• Aggregation latency. The aggregation of local models canonly be performed when all the model updates are col-lected. Therefore, the latency of the aggregation process



is affected by the slowest model update that arrives at thecentral server.

Solution: The global model aggregation is conducted when-ever a model update is received, without being delayed byother clients. Then the server starts the next iteration and dis-tributes the new central model to the clients that are ready fortraining. The delayed model updates that are not included inthe current aggregation roundwill be added in the next roundwith some reduction in the weightage, proportioned to theirrespective delayed time.Consequences:

Benefits:• Low aggregation latency. Faster aggregation time per round

is achievable as there is no need to wait for the model up-dates from other clients for the aggregation round. Thebandwidth usage per iteration is reduced as fewer localmodel updates are transferred and receive simultaneouslyevery round.

Drawbacks:• Communication cost. The number of iteration to collect

all local mode updates increases for the asynchronous ap-proach. More iterations are required for the entire train-ing process to train the model till convergence comparesto synchronous global model aggregation.

• Model bias. The global model of each round does not con-tain all the features and information of every local modelupdate. Hence the global model might have a certain levelof bias in prediction.

Related patterns: Client Registry, Client Selector, ModelCo-versioning Registry, Client Update Scheduler

Known uses:

• Asynchronous Online Federated Learning (ASO-fed) [11]is a framework for federated learning that adopted asyn-chronous aggregation. The central server update the globalmodel whenever it receives a local update from one clientdevice (or several client devices if the local updates are re-ceived simultaneously). On the client device side, online-learning is performed as data continue to arrive during theglobal iterations.

• Asynchronous federated SGD-Vertical Partitioned (AFSGD-VP) [14] algorithm uses a tree-structured communicationscheme to perform asynchronous aggregation. The algo-rithm does not need to align the iteration number of themodel aggregation from different client devices to com-pute the global model.

• Asynchronous Federated Optimization (FedAsync) [43] isan approach that leverages asynchronous updating tech-nique and avoids server-side timeouts and abandoned roundswhile requires no synchronous model broadcast to all theselected client devices.

Blockchain

Figure 15: Decentralised Aggregator.

3.4.2. Pattern 12: Decentralised AggregatorSummary: A decentralised aggregator improves system re-liability and accountability by removing the central serverthat is a possible single-point-of-failure. Fig. 15 illustratesthe decentralised federated learning system built using blockchainand smart contract, while the model updates are performedthrough the exchange between neighbour devices.Context: The model training and aggregation are coordi-nated by a central server and both the central server and theowner may not be trusted by all the client devices that jointhe training process.Problem: In FedAvg, all the chosen devices have to submitthe model parameters to one central server every round. Thisis extremely burdensome to the central server and networkcongestion may occur. Furthermore, centralised federatedlearning possesses a single-point-of-failure. Data privacythreats may also occur if the central server is compromisedby any unauthorised entity. The mutual trust between theclient devices and the central server may not be specificallyestablished.Forces: The problem requires to balance the following forces:• Decentralised model management. The federated learn-

ing systems face challenges to collect, store, examine, andaggregate the local models due to the removal of the cen-tral server.

• System ownership. Currently, the central server is own bythe learning coordinator that creates the federated learningjobs. The removal of the central server requires the re-definition of system ownership. It includes the authorityand accessibility of learning coordinator in the federatedlearning systems.

Solution: A decentralised aggregator replaces the centralserver’s role in a federated learning system. The aggregationand update of the models can be performed through peer-to-peer exchanges between client devices. First, a randomclient from the system can be an aggregator by requestingthe model updates from the other clients that are close to it.Simultaneously, the client devices conduct local model train-ing in parallel and send the trained local models to the aggre-gator. The aggregator then produces a new global model andsends it to the client network. Blockchain is the alternativeto the central server for model storage that prevents single-



point-of-failure. The ownership of the blockchain belongs tothe learning coordinator that creates the new training tasksand maintains the blockchain. Furthermore, the record ofmodels on a blockchain is immutable that increases the reli-ability of the system. It also increases the trust of the systemas the record is transparent and accessible by all the clientdevices.Consequences:

Benefits:• System reliability. The removal of single-point-of-failure

increases the system reliability by reducing the securityrisk of the central server from any adversarial attack orthe failure of the entire training process due to the mal-function of the central server.

• System accountability. The adoption of blockchain pro-motes accountability as the records on a blockchain is im-mutable and transparent to all the stakeholders.

Drawbacks:• Latency. Client device as a replacement of the central

server for model aggregation is not ideal for direct com-munication with multiple devices (star-topology). Thismay cause latency in the model aggregation process dueto blockchain consensus protocols.

• Computation cost. Client devices have limited computa-tion power and resource to performmodel training and ag-gregation parallel. Even if the training process and the ag-gregation is performed sequentially, the energy consump-tion to perform multiple rounds of aggregation is veryhigh.

• Storage cost. High storage cost is required to store all thelocal and global models on storage-limited client devicesor blockchain.

• Data privacy. Client devices can access the record of allthemodels under decentralised aggregation and blockchainsettings. This might expose the privacy-sensitive infor-mation of the client devices to other parties.

Related patterns: Model Co-versioning Registry, IncentiveRegistry

Known uses:

• BrainTorrent [36] is a server-less, peer-to-peer approachto perform federated learning where clients communicatedirectly among themselves, specifically for federated learn-ing in medical centers.

• FedPGA [20] is a decentralised aggregation algorithm de-veloped from FedAvg. The devices in FedPGA exchangepartial gradients rather than full model weights. The par-tial gradient exchange pulls and merges the different sliceof the updates from different devices and rebuild a mixedupdate for aggregation.

• A fully decentralised framework [26] is an algorithm inwhich users update their beliefs by aggregate informationfrom their one-hop neighbors to learn a model that bestfits the observations over the entire network.

• A Segmented gossip approach [16] splits a model intosegmentation that contains the same number of non-overlappingmodel parameters. Then, the gossip protocol is adoptedwhere each client stochastically selects a few other clientsto exchange the model segmentation for each training it-eration without the orchestration of a central server.

3.4.3. Pattern 13: Hierarchical Aggregator

Edgeserver1

Edgeserver2

CentralServer

Clientdevice

Clientdevice

Clientdevice

Clientdevice

Clientdevice

Figure 16: Hierarchical Aggregator.

Summary: To reduce non-IID effects on the global modeland increase system efficiency, a hierarchical aggregator addsan intermediate layer (e.g., edge server) to perform partialaggregations using the local model parameters from closely-related client devices before the global aggregation. In Fig. 16,edge servers are added as an intermediate layer between thecentral server and client devices to serve the client devicesthat are closer to them.Context: The communication between the central serverand the client devices is slowed down or frequently disrupteddue to being physically distant from each other and are wire-lessly connected.Problem: The central server can access and store more databut requires high communication overhead and suffers fromlatency due to being physically distant from the client de-vices. Moreover, client devices possess non-IID character-istics that affect global model performance.Forces: The problem requires the following forces to be bal-anced:• System efficiency. The system efficiency of the server-

client setting to perform federated learning is low, as thecentral server is burdensome to accommodate the commu-nication and themodel aggregations of thewidely-distributedclient devices.

• Data heterogeneity. In the server-client setting of a fed-erated learning system, the data heterogeneity character-



istics of client devices become influential and dominantto the global model production, as the central server dealswith all the client devices that generate non-IID data.

Solution: A hierarchical aggregator adds edge servers be-tween the central server and client devices. The combinationof server-edge-client architecture can improve both compu-tation and communication efficiency of the federated modeltraining process. Edge servers collect local models from thenearest client devices, a subset of all the client devices. Af-ter every k1 round of local training on each client, each edgeserver aggregates its clients’ models. After every k2 edgemodel aggregations, the cloud server aggregates all the edgeservers’ models, which means the communication with thecentral server happens every k1k2 local updates [28].Consequences:

Benefits:• Communication efficiency. The hierarchical aggregators

speed up the global model aggregation and improve com-munication efficiency.

• Scalability. Adding an edge layer helps to scale the systemby improving the system’s ability to handling more clientdevices.

• Data heterogeneity and non-IID reduction. The partialaggregation in a hierarchicalmanner aggregates localmod-els that have similar data heterogeneity and non-IIDnessbefore the global aggregation on the central server. Thisgreatly reduces the effect of data heterogeneity and non-IIDness on global models.

• Computation and storage efficiency. The edge devices arerich with computation and storage resources to performpartial model aggregation. Furthermore, edge devices arenearer to the client devices which increase the model ag-gregation and computation efficiency.

Drawbacks:• System reliability. The failure of edge devices may cause

the disconnection of all the client devices under those edgeservers and affect the model training process, model per-formance, and system reliability.

• System security. Edge servers could become security breachpoints as they have lower security setups than the centralserver and the client devices. Hence, they are more proneto network security threats or becoming possible points-of-failure of the system.

Related patterns: Client Registry, Client Cluster, ModelCo-versioning Registry

Known uses:

• HierFAVG is an algorithm that allowsmultiple edge serversto perform partial model aggregation incrementally fromthe collected updates from the client devices.

• Hierarchical Federated Learning (HFL) enables hierarchi-cal model aggregation in large scale networks of client de-vices where communication latency is prohibitively largedue to limited bandwidth. The HFL seeks a consensuson the model and uses edge servers to aggregate modelupdates from client devices that are geographically near.

• Federated Learning + Hierarchical Clustering (FL+HC)is the addition of a hierarchical clustering algorithm to thefederated learning system. The cluster is formed accord-ing to the data distributions similarity based on the fol-lowing distance metrics: (1) Manhattan, (2) Euclidean,(3) Cosine distance metrics.

• Astraea is a federated learning framework that tackles non-IID characteristics of federated clients. The frameworkintroduces a mediator to the central server and the clientdevices to balance the skewed client data distributions.The mediator performs the z-score-based data augmenta-tion and downsampling to relieve the global imbalancedof training data.

3.4.4. Pattern 14: Secure Aggregator

Secureaggregator

CentralServer

Clientdevice

Clientdevice

Secureaggregator

Secureaggregator

Figure 17: Secure Aggregator.

Summary: A security aggregator manages the model ex-change and aggregation security protocols to protect modelsecurity. Fig. 17 illustrates the security aggregator on eachcomponents with the security protocols embedded in them.Context: The central server sends global models to any ex-isting or unknown device every round with no data privacyand security protocols that protect the communication fromunauthorised access. Furthermore, model parameters con-tain pieces of private user information that can be inferredby the data-hungry machine learning algorithms.Problem: There are no securitymeasures to tackle the honest-but-curious and active adversary security threats which existin federated learning systems.Forces: The problem requires to balance the following forces:• Client device security. Client device security issues exist

when dishonest andmalicious client devices join the train-ing process and poison the overall model performance bydisrupting the training process or providing false updatesto the central server.



• Data security. Data security of the client devices is chal-lenged when the gradients or model parameters are in-ferred by unauthorised parties through the data-hungrymachine learning algorithms.

Solution: A security aggregator handles the secure mul-tiparty computation (SMC) protocols for model exchangesand aggregations. The protocols provide security proof toguarantee that each party knows only its input and output.For instance, homomorphic encryption is a method to en-crypt the models and only allow authorised client devicesand the central server to decrypt and access themodels. Pair-wise masking and differential privacy (DP) methods are ap-plied to reduce the interpretability of the model by unautho-rised party [46]. The technique involves adding noise to theparameters or gradient or uses a generalised method.Consequences:

Benefits:• Data security. The secure aggregator protects the model

from being access by adversarial and unauthorised partiesthrough homomorphic encryptions and prevents informa-tion leakage due to the data-hungry property of machinelearning models.

Drawbacks:• System efficiency. The extra security processes affect the

system efficiency if excessive security steps are requiredevery round for every device. It also lowers the trainingand aggregation speed due to encryption and decryptiontime.

• Model performance-privacy trade-off. The model perfor-mance is affected if the model privacy methods aggres-sively interfere with the model’s interpretability due tobeing excessively obscure.

• Compromised key. For encryption and decryption func-tions, the possible compromise of the security keys in-creases the privacy threat.

Related patterns: Client Registry, Model Co-versioning Reg-istry

Known uses:

• SecAgg[7] a practical protocol by Google for secure ag-gregation in the federated learning settings.

• HybridAlpha [44] is a framework that manages the clientdevices that join the federated learning process. The se-curity operation includes functional encryption, DP, andSMC.

• TensorFlow Privacy Library20 provides an implementa-tion of DP-SGD machine learning.20https://github.com/tensorflow/privacy/

• ZamaAI21 is an AI service platform that provides encryp-tion technology that uses a homomorphic compiler to con-vert the model into an end-to-end encrypted parameters.

• Simple Encrypted Arithmetic Library (SEAL22) is a ho-momorphic encryptionAPI introduced byMicrosoft AI toallow computations to be performed directly on encrypteddata.

4. DiscussionVarious patterns are proposed to improve the architecturaldesign challenges of a federated learning system. The mainchallenges include communication& computation efficiency,data privacy, model performance, system security, and relia-bility. First, client registry, client selector, and client clusterare proposed for client device management in the job cre-ation stage. These patterns manage client devices to improvemodel performance, system, and training efficiency.During the model training stage, the performance trade-offoften occurs due to the non-IID nature of the local data. Thepatterns proposed to address this issue are heterogeneousdata handler and incentive registry. Furthermore, the non-IID data that enhances the local personalisation of the modelalso hurts the generalisation of the global model produced.The patterns proposed to address this issue are client clus-ter, hierarchical aggregator, multi-task model trainer, anddeployment selector. Specifically, multi-task model traineradopts multi-task or transfer learning techniques to learn dif-ferent models or personalise a global model on local data tooptimise the model performance for clients with different lo-cal data characteristics, whereas the deployment selector ef-fectively selects the user clients to receive the personalisedmodels that fit their local data.For the model exchange and aggregation stages, communi-cation and computation efficiency become a system bottle-neck. To effectively tackle these issues, client selector, clientcluster, deployment selector, asynchronous aggregator, andhierarchical aggregator are embedded in the system to opti-mise resource consumption. However, these patterns requireextra client information (i.e., resource or performance) toperform the selection or scheduling of updates. Intuitively,the collection and analysis of the client information on thecentral server may lead to another form of data privacy vio-lation. Furthermore, extra computation and communicationresources are consumed to collect and analyse the client in-formation, in addition to the model training task and the fun-damental tasks of the client devices. Hence, message com-pressor and hierarchical aggregator are proposed to tacklethese issues. Moreover, incentive registry is proposed to en-courage more client devices to join the training to improvethe model performance.The system security issue is present due to the distributedownership of federated learning system components. The

21https://zama.ai/22https://www.microsoft.com/en-us/research/project/microsoft-seal/


https://github.com/tensorflow/privacy/

https://zama.ai/

https://www.microsoft.com/en-us/research/project/microsoft-seal/


client nodes are mostly owned by different parties which arenot governed by the system owner. Therefore, unauthorisedclients may join the system and obtain model parametersfrom the system. Furthermore, adversarial clients may harmthe model or system performance by uploading dishonestupdates. Secure aggregator, model co-versioning registry,and client registry aim to solve these challenges. Lastly, thetrustworthiness between the client devices and the centralserver is also a challenge to gain the participation of clients.The central server may also be a single-point-of-failure thatmay affect the reliability of the system. Hence, decentralisedaggregator is proposed to solve the issue.

5. Related WorkIn many real-world scenarios, machine learning applicationsare usually embedded as a software component to a largersoftware system at enterprise level. Hence, to promote enter-prise level adoption of machine learning-based applications,many researchers view machine learning models as a com-ponent of a software system so that the challenges in buildingmachine learning models can be tackled through systematicsoftware engineering approaches.Wan et al. [39] studied how the adoption of machine learn-ing changes software development practices. The work char-acterises the differences in various aspects of software engi-neering and task involved for machine learning system de-velopment and traditional software development. Lwakatareet al. [32] propose a taxonomy that depicts maturity stagesof use of machine learning components in the industrial soft-ware system andmapped the challenges to themachine learn-ing pipeline stages.Building machine learning models is becoming an engineer-ing discipline where practitioners take advantage of tried-and-proven methods to address recurring problems [25].Washizaki et al. [41] studies the machine learning designpatterns and architectural patterns. The authors also pro-posed an architectural pattern for machine learning for im-proving operational stability [47]. The work separates ma-chine learning systems’ components into business logic andmachine learning components and focuses on the machinelearning pipeline management, data management, and ma-chine learning model versioning operations.The research on federated learning system design was firstdone by Bonawitz et. al [8], focusing on the high-level de-sign of a basic federated learning system and its protocol def-inition. However, there is no study on the definition of ar-chitecture patterns or reusable solutions to address federatedlearning design challenges currently. Our work addressesthis particular gap with respect to software architecture de-signs for federated learning as a distributed software system.To the best of our knowledge, this is the first comprehensiveand systematic collection of federated learning architecturalpatterns. The outcomes are intended to provide architecturalguidance for practitioners to better design and develop fed-erated learning systems.

6. ConclusionFederated learning is a data privacy-preserving, distributedmachine learning approach to fully utilise the data and re-sources on IoT and smartmobile devices. Being a distributedsystem with multiple components and different stakehold-ers, architectural challenges need to be solved before feder-ated learning can be effectively adopted in the real-world.In this paper, we present 14 federated learning architecturalpatterns associated with the lifecycle of a model in federatedlearning. The pattern collection is provided as architecturalguidance for architects to better design and develop feder-ated learning systems. In our future work, we will explorethe architectural designs that help improve the trust in feder-ated learning.

References[1] , 2019. General data protection regulation gdpr. URL: https://gdpr

-info.eu/.[2] Ahmad, A., Jamshidi, P., Pahl, C., Khaliq, F., 2014. A pattern lan-

guage for the evolution of component-based software architectures.Electronic Communications of the EASST 59.

[3] Ahn, J., Simeone, O., Kang, J., 2019. Wireless federated distillationfor distributed edge learning with heterogeneous data, in: 2019 IEEE30th Annual International Symposium on Personal, Indoor and Mo-bile Radio Communications (PIMRC), pp. 1–6.

[4] Balalaie, A., Heydarnoori, A., Jamshidi, P., Tamburri, D.A., Lynn,T., 2018. Microservices migration patterns. Software: Practice andExperience 48, 2019–2042.

[5] Bao, X., Su, C., Xiong, Y., Huang, W., Hu, Y., 2019. Flchain: Ablockchain for auditable federated learning with trust and incentive,in: 2019 5th International Conference on Big Data Computing andCommunications (BIGCOM), pp. 151–159.

[6] Beck, K., Cunningham, W., 1987. Using pattern languages for objectoriented programs, in: Conference on Object-Oriented Programming,Systems, Languages, and Applications (OOPSLA).

[7] Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan,H.B., Patel, S., Ramage, D., Segal, A., Seth, K., 2017. Practical secureaggregation for privacy-preserving machine learning, Association forComputing Machinery, New York, NY, USA.

[8] Bonawitz, K.A., Eichner, H., Grieskamp, W., Huba, D., Ingerman,A., Ivanov, V., Kiddon, C.M., Konečný, J., Mazzocchi, S., McMahan,B., Overveldt, T.V., Petrou, D., Ramage, D., Roselander, J., 2019.Towards federated learning at scale: System design, in: SysML 2019.To appear.

[9] Brinkkemper, S., 1996. Method engineering: engineering of infor-mation systems development methods and tools. Information andSoftware Technology 38, 275–280. Method Engineering and Meta-Modelling.

[10] Chai, Z., Ali, A., Zawad, S., Truex, S., Anwar, A., Baracaldo, N.,Zhou, Y., Ludwig, H., Yan, F., Cheng, Y., 2020. Tifl: A tier-basedfederated learning system, in: Proceedings of the 29th InternationalSymposium on High-Performance Parallel and Distributed Comput-ing, Association for Computing Machinery, New York, NY, USA. p.125–136.

[11] Chen, Y., Ning, Y., Slawski, M., Rangwala, H., 2020. Asyn-chronous online federated learning for edge devices with non-iid data.arXiv:1911.02134.

[12] Corinzia, L., Buhmann, J.M., 2019. Variational federated multi-tasklearning. arXiv:1906.06268.

[13] Duan, M., Liu, D., Chen, X., Tan, Y., Ren, J., Qiao, L., Liang, L.,2019. Astraea: Self-balancing federated learning for improving clas-sification accuracy of mobile deep learning applications, in: 2019IEEE 37th International Conference on Computer Design (ICCD), pp.246–254.


https://gdpr-info.eu/

https://gdpr-info.eu/

http://arxiv.org/abs/1911.02134



[14] Gu, B., Xu, A., Huo, Z., Deng, C., Huang, H., 2020. Privacy-preserving asynchronous federated learning algorithms for multi-party vertically collaborative learning. arXiv:2008.06233.

[15] Haddadpour, F., Kamani, M.M., Mokhtari, A., Mahdavi, M., 2020.Federated learning with compression: Unified analysis and sharpguarantees. arXiv:2007.01154.

[16] Hu, C., Jiang, J., Wang, Z., 2019. Decentralized federated learning:A segmented gossip approach. arXiv:1908.07782.

[17] Huang, L., Shea, A.L., Qian, H., Masurkar, A., Deng, H., Liu, D.,2019. Patient clustering improves efficiency of federated machinelearning to predict mortality and hospital stay time using distributedelectronic medical records. Journal of Biomedical Informatics 99,103291.

[18] Jamshidi, P., Pahl, C., Mendonça, N.C., 2017. Pattern-based multi-cloud architecture migration. Software: Practice and Experience 47,1159–1184.

[19] Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., Kim, S.L.,2018. Communication-efficient on-device machine learning: Fed-erated distillation and augmentation under non-iid private data.arXiv:1811.11479.

[20] Jiang, J., Hu, L., 2020. Decentralised federated learning with adap-tive partial gradient aggregation. CAAI Transactions on IntelligenceTechnology 5, 230–236.

[21] Jiang, Y., Wang, S., Valls, V., Ko, B.J., Lee, W.H., Leung, K.K., Tas-siulas, L., 2020. Model pruning enables efficient federated learningon edge devices. arXiv:1909.12326.

[22] Jobin, A., Ienca, M., Vayena, E., 2019. The global landscape of AIethics guidelines. Nature Machine Intelligence 1, 389–399.

[23] Kitchenham, B., Charters, S., 2007. Guidelines for performing sys-tematic literature reviews in software engineering.

[24] Konečný, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T.,Bacon, D., 2017. Federated learning: Strategies for improving com-munication efficiency. arXiv:1610.05492.

[25] Lakshmanan, L.V., Munn, M., Robinson, S., 2020. Machine LearningDesign Patterns.

[26] Lalitha, A., Shekhar, S., Javidi, T., Koushanfar, F., 2018. Fully de-centralized federated learning, in: Third workshop on Bayesian DeepLearning (NeurIPS).

[27] Li, T., Sahu, A.K., Talwalkar, A., Smith, V., 2020. Federated learning:Challenges, methods, and future directions. IEEE Signal ProcessingMagazine 37, 50–60.

[28] Liu, L., Zhang, J., Song, S.H., Letaief, K.B., 2020. Client-edge-cloudhierarchical federated learning, in: ICC 2020 - 2020 IEEE Interna-tional Conference on Communications (ICC), pp. 1–6.

[29] Liu, Y., Ai, Z., Sun, S., Zhang, S., Liu, Z., Yu, H., 2020. FedCoin: APeer-to-Peer Payment System for Federated Learning. Springer Inter-national Publishing, Cham. pp. 125–138.

[30] Lo, S.K., Liew, C.S., Tey, K.S., Mekhilef, S., 2019. An interoperablecomponent-based architecture for data-driven iot system. Sensors 19.

[31] Lo, S.K., Lu, Q., Wang, C., Paik, H.Y., Zhu, L., 2021. A system-atic literature review on federated machine learning: From a softwareengineering perspective. ACM Comput. Surv. 54.

[32] Lwakatare, L.E., Raj, A., Bosch, J., Olsson, H.H., Crnkovic, I., 2019.A taxonomy of software engineering challenges for machine learn-ing systems: An empirical investigation, in: Kruchten, P., Fraser,S., Coallier, F. (Eds.), Agile Processes in Software Engineering andExtreme Programming, Springer International Publishing, Cham. pp.227–243.

[33] McMahan, H.B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.,2017. Communication-efficient learning of deep networks from de-centralized data. arXiv:1602.05629.

[34] Meszaros, G., Doble, J., 1997. A Pattern Language for PatternWriting. Addison-Wesley Longman Publishing Co., Inc., USA. p.529–574.

[35] Mirbel, I., Ralyté, J., 2006. Situational method engineering: combin-ing assembly-based and roadmap-driven approaches. RequirementsEngineering 11, 58–78.

[36] Roy, A.G., Siddiqui, S., Pölsterl, S., Navab, N., Wachinger, C., 2019.

Braintorrent: A peer-to-peer environment for decentralized federatedlearning. arXiv:1905.06731.

[37] Sattler, F., Wiedemann, S., Müller, K.R., Samek, W., 2020. Robustand communication-efficient federated learning from non-i.i.d. data.IEEE Transactions on Neural Networks and Learning Systems 31,3400–3413.

[38] Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A., 2018. Federatedmulti-task learning. arXiv:1705.10467.

[39] Wan, Z., Xia, X., Lo, D., Murphy, G.C., 2019. How does machinelearning change software development practices? IEEE Transactionson Software Engineering , 1–1.

[40] WANG, L., WANG, W., LI, B., 2019. Cmfl: Mitigating communi-cation overhead for federated learning, in: 2019 IEEE 39th Interna-tional Conference on Distributed Computing Systems (ICDCS), pp.954–964.

[41] Washizaki, H., Uchida, H., Khomh, F., Guéhéneuc, Y., 2019. Study-ing software engineering patterns for designing machine learning sys-tems, in: 2019 10th International Workshop on Empirical SoftwareEngineering in Practice (IWESEP), pp. 49–495.

[42] Weng, J., Weng, J., Zhang, J., Li, M., Zhang, Y., Luo, W., 2019.Deepchain: Auditable and privacy-preserving deep learning withblockchain-based incentive. IEEE Transactions on Dependable andSecure Computing , 1–1.

[43] Xie, C., Koyejo, S., Gupta, I., 2020. Asynchronous federated opti-mization. arXiv:1903.03934.

[44] Xu, R., Baracaldo, N., Zhou, Y., Anwar, A., Ludwig, H., 2019a.Hybridalpha: An efficient approach for privacy-preserving federatedlearning, in: Proceedings of the 12th ACMWorkshop on Artificial In-telligence and Security, Association for Computing Machinery, NewYork, NY, USA. p. 13–23.

[45] Xu, Z., Yu, F., Xiong, J., Chen, X., 2019b. Helios: Heterogeneity-aware federated learning with dynamically balanced collaboration.arXiv preprint arXiv:1912.01684 .

[46] Yang, Q., Liu, Y., Chen, T., Tong, Y., 2019. Federated machine learn-ing: Concept and applications. ACM Trans. Intell. Syst. Technol. 10.

[47] Yokoyama, H., 2019. Machine learning system architectural patternfor improving operational stability, in: 2019 IEEE International Con-ference on SoftwareArchitecture Companion (ICSA-C), pp. 267–274.

[48] Zdun, U., 2007. Systematic pattern selection using pattern languagegrammars and design space analysis. Software: Practice and Experi-ence 37, 983–1016.

[49] Zhang, W., Lu, Q., Yu, Q., Li, Z., Liu, Y., Lo, S.K., Chen, S., Xu, X.,Zhu, L., 2020. Blockchain-based federated learning for device failuredetection in industrial iot. IEEE Internet of Things Journal , 1–12.












Documents

Architectural Patterns for the Design of Federated