39
111 A Systematic Literature Review on Federated Machine Learning: From A Soſtware Engineering Perspective SIN KIT LO, Data61, CSIRO and University of New South Wales, Australia QINGHUA LU, Data61, CSIRO and University of New South Wales, Australia CHEN WANG, Data61, CSIRO, Australia HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia Federated learning is an emerging machine learning paradigm where multiple clients train models locally and formulate a global model based on the local model updates. To identify the state-of-the-art in federated learning from a software engineering perspective, we performed a systematic literature review with the extracted 231 primary studies. The results show that most of the known motivations of federated learning appear to be the most studied federated learning challenges, such as communication efficiency and statistical heterogeneity. Also, there are only a few real-world applications of federated learning. Hence, more studies in this area are needed before the actual industrial-level adoption of federated learning. CCS Concepts: Computing methodologies Artificial intelligence; Machine learning; Software and its engineering;• Security and privacy; Additional Key Words and Phrases: federated learning, systematic literature review, software architecture, software engineering, distributed computing, machine learning, privacy, communication, edge computing ACM Reference Format: Sin Kit Lo, Qinghua Lu, Chen Wang, Hye-Young Paik, and Liming Zhu. 2020. A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective. J. ACM 37, 4, Article 111 (August 2020), 39 pages. https://doi.org/10.1145/1122445.1122456 1 INTRODUCTION Machine learning has been broadly adopted and has shown success in various domains. Data plays a critical role in machine learning-driven systems due to its impact on the models’ performance. Although remote devices (e.g., mobile devices, IoT devices) are widely deployed and generate massive amounts of data every day, data hungriness is still a major challenge in the development of machine learning-driven systems due to the growing attention in data privacy (e.g., General Data Protection Regulation - GDPR [3]) To effectively address this challenge, the concept of federated learning was introduced by Google in 2016 [114]. In federated learning, multiple clients (e.g., mobile devices) collaboratively perform model training locally to generate a global model. The data of each client is stored locally and never Authors’ addresses: Sin Kit Lo, [email protected], Data61, CSIRO and University of New South Wales, Australia; Qinghua Lu, [email protected], Data61, CSIRO and University of New South Wales, Australia; Chen Wang, [email protected], Data61, CSIRO, Australia; Hye-Young Paik, [email protected], University of New South Wales, Australia; Liming Zhu, [email protected], Data61, CSIRO and University of New South Wales, Australia. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2020 Association for Computing Machinery. 0004-5411/2020/8-ART111 $15.00 https://doi.org/10.1145/1122445.1122456 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020. arXiv:2007.11354v6 [cs.SE] 27 Oct 2020

SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111

A Systematic Literature Review on Federated MachineLearning: From A Software Engineering Perspective

SIN KIT LO, Data61, CSIRO and University of New South Wales, AustraliaQINGHUA LU, Data61, CSIRO and University of New South Wales, AustraliaCHEN WANG, Data61, CSIRO, AustraliaHYE-YOUNG PAIK, University of New South Wales, AustraliaLIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

Federated learning is an emerging machine learning paradigm where multiple clients train models locallyand formulate a global model based on the local model updates. To identify the state-of-the-art in federatedlearning from a software engineering perspective, we performed a systematic literature review with theextracted 231 primary studies. The results show that most of the known motivations of federated learningappear to be the most studied federated learning challenges, such as communication efficiency and statisticalheterogeneity. Also, there are only a few real-world applications of federated learning. Hence, more studies inthis area are needed before the actual industrial-level adoption of federated learning.

CCS Concepts: • Computing methodologies → Artificial intelligence; Machine learning; • Softwareand its engineering; • Security and privacy;

Additional Key Words and Phrases: federated learning, systematic literature review, software architecture,software engineering, distributed computing, machine learning, privacy, communication, edge computing

ACM Reference Format:Sin Kit Lo, Qinghua Lu, ChenWang, Hye-Young Paik, and Liming Zhu. 2020. A Systematic Literature Review onFederatedMachine Learning: FromA Software Engineering Perspective. J. ACM 37, 4, Article 111 (August 2020),39 pages. https://doi.org/10.1145/1122445.1122456

1 INTRODUCTIONMachine learning has been broadly adopted and has shown success in various domains. Data playsa critical role in machine learning-driven systems due to its impact on the models’ performance.Although remote devices (e.g., mobile devices, IoT devices) are widely deployed and generatemassive amounts of data every day, data hungriness is still a major challenge in the development ofmachine learning-driven systems due to the growing attention in data privacy (e.g., General DataProtection Regulation - GDPR [3])

To effectively address this challenge, the concept of federated learning was introduced by Googlein 2016 [114]. In federated learning, multiple clients (e.g., mobile devices) collaboratively performmodel training locally to generate a global model. The data of each client is stored locally and never

Authors’ addresses: Sin Kit Lo, [email protected], Data61, CSIRO and University of New South Wales, Australia;Qinghua Lu, [email protected], Data61, CSIRO and University of New South Wales, Australia; Chen Wang,[email protected], Data61, CSIRO, Australia; Hye-Young Paik, [email protected], University of New SouthWales, Australia; Liming Zhu, [email protected], Data61, CSIRO and University of New South Wales, Australia.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2020 Association for Computing Machinery.0004-5411/2020/8-ART111 $15.00https://doi.org/10.1145/1122445.1122456

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

arX

iv:2

007.

1135

4v6

[cs

.SE

] 2

7 O

ct 2

020

Page 2: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:2 Lo, et al.

transferred to the central server or other clients [39, 74]. Instead, only model parameter updatesare communicated for global model formulation.

The increased interest in federated learning has resulted in a growing number of publications inthis area over the last four years. Although there are existing surveys on this topic [74, 94, 96], tothe best of our knowledge, there is still no systematic literature review on federated learning. Thismotivates us to conduct a systematic literature review on federated learning to understand thestate-of-the-art. Furthermore, federated learning involves a high number of participating clients’collaboration which forms a large-scale distributed system. This calls for the software engineeringthinking apart from the core machine learning concept [164]. Hence, the second motivation isto explore how to develop federated learning systems by examining existing studies. Thus, weconduct a systematic literature review to provide an up-to-date, holistic, and comprehensive viewof the state-of-the-art of federated learning research, especially from the software engineeringperspective.We perform a systematic literature review following Kitchenham’s standard guideline [82].

The objectives of this systematic literature review are to: (1) provide an overview of the researchactivities and a guide to a variety of research topics in the landscape of federated learning systemdevelopment; (2) help practitioners understand challenges and approaches about how to developfederated learning systems. The contributions of this paper are as follow:

• We identify a set of 231 primary studies related to federated learning published from 2016.01.01to 2020.01.31 following the systematic literature review guideline.

• We present a comprehensive qualitative and quantitative synthesis reflecting the state-of-the-art in federated learning with data extracted from the 231 studies. Our data synthesiscovers the software engineering aspects of federated learning system development.

• We provide findings from the results and identify future trends to support further researchin federated learning.

The remainder of the paper is organised as follows: Section 2 introduces the methodology.Section 3 presents the study results. Section 4 discusses future trends, followed by threats to validityin Section 5. Section 6 discusses related work and Section 7 concludes the paper.

2 METHODOLOGYAccording to Kitchenham’s guideline [82], we develop the following a protocol for our systematicliterature review.

2.1 ResearchQuestionsTo provide a systematic literature review from the software engineering perspective, we observefederated learning as a software system and try to understand the federated learning systemsfrom the software development standpoint. We adopt the software development practices ofmachine learning-driven systems in [164] to design the Software Development Life Cycle (SDLC)for federated learning, as shown in Fig 1. The work mentioned the differences between traditionalsoftware system development and machine learning-driven system development which includerequirement analysis, design, testing and quality, and process management. Hence, we focus on thesoftware development analysis of these four aspects. For instance, we derived research questionssuch as "What are the challenges of federated learning being addressed?" and "What data is thefederated learning applications dealing with?" to understand the requirements of development offederated learning systems, and "What are the possible components of federated learning systems?"for the design of federated learning systems. In this section, we explain how each research question(RQ) is derived. Note that we only focus on the software engineering and architecture design

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 3: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:3

Fig. 1. Research questions mapping with Software Development Life Cycle (SDLC)

perspective. Therefore, the hardware design perspectives are not covered in this study. The sub-RQsare ordered based on their relativeness to its main RQs.

2.1.1 Background Understanding: To develop a federated learning system, we need to first knowwhat federated learning is and when to adopt federated learning instead of centralised learning.Hence, we derivedRQ1 (What is federated learning?) andRQ1.1 (Why is federated learningadopted?). After obtaining the basic idea of federated learning and what challenges does federatedlearning tackle, we intend to identify the state-of-the-art of federated learning and who are theexperts in this field. We derived RQ 2.1 (How many research activities have been conductedon federated learning?) and RQ 2.2 (Who is leading the research in federated learning?)to extract this information. Lastly, we derived RQ 3.2 (What phases of the machine learningpipeline are covered by the research activities?) to examine the focused machine learningpipeline stages in the existing studies and understand the maturity of the research area. With thistechnical and empirical understanding in the state-of-the-art of federated learning, practitioners canhave a better understanding in the process management of a federated learning system, consideringthe data flow throughout the entire system.

2.1.2 Requirement Analysis: After the background understanding stage, practitioners have to con-duct analyses on the system development requirements of federated learning for their application.We first derived RQ 1.2 (What are the applications of federated learning?), RQ 1.3 (Whatdata is the federated learning applications dealing with?) and RQ 1.4 (What is the clientdata distribution of the federated learning application?) to identify the characteristics of theapplications that adopt the federated learning technique. This helps the practitioners to identify sim-ilar applications that have adopted federated learning. Moreover, audience can determine whethertheir applications are suitable to adopt federated learning, based on the data types and distributions.After narrowing down the scope, practitioners are required to identify the possible challengesof federated learning adoptions. To extract this information, we derived RQ 2 (What are thechallenges of federated learning being addressed?) to identify the architectural drivers (i.e.requirements) of a federated learning system.

2.1.3 Architecture Design: After the requirement analysis for the federated learning systems,practitioners need to understand how to design the architecture for the application. We need toconsider the approaches to address the challenges to fulfill each requirement. Moreover, how to

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 4: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:4 Lo, et al.

balance the tradeoffs between each requirement should be discussed for the system architecture de-sign. Hence, we derived RQ 3 (How are the federated learning challenges being addressed?)and RQ 3.1 (What are the possible components of federated learning systems?). These 2RQs are targeted (1) to identify the possible approaches that address the challenges during thefederated learning system development, and (2) to extract the software components for federatedlearning architecture design.

2.1.4 Implementation and Evaluation: After the architecture design stage, once the system isimplemented, practitioners need to evaluate various attributes of the system. Furthermore, machinelearning tasks require extensive model testing and quality assessment. Hence, we derived RQ 4(How are the approaches evaluated?) and RQ 4.1 (What are the evaluation metrics usedto evaluate the approaches?) to identify the quantitative and qualitative evaluation methods andmetrics for the federated learning system.

2.2 Sources Selection and StrategyWe searched the literature from the following sources: (1) ACM Digital Library, (2) IEEE Xplorer,(3) ScienceDirect (4) Springer Link, (5) ArXiv, and (6) Google scholar. The search time frame is setbetween 2016.01.01 and 2020.01.31. We screened and selected the papers from the initial searchaccording to the preset inclusion and exclusion criteria. The inclusion and exclusion criteria areelaborated in Section 2.2.2.

We then conducted a forward and backward snowballing process to search for any related papersthat were left out from the initial search. The study selection process consists of 2 phases: (1)The selection of papers is conducted by two independent researchers through title and abstractscreening, based on the inclusion and exclusion criteria. Then, the two researchers cross-checkedthe results and resolved any disagreement on the decisions. (2) The papers selected in the firstphase are then assessed through full-text screening. The two researchers again cross-checked theselection results and resolved any disagreement on the selection decisions. Should disagreementoccurred in either the first or second phase, the third researcher (meta-reviewer) is consulted tofinalise the decision. Fig. 2 shows the paper search and selection process.

The initial search found 1074 papers, with 76 from ACM Digital Library, 320 from IEEE Xplorer,5 from ScienceDirect, 85 from Springer Link, 256 from ArXiv, and 332 from Google scholar. Afterthe paper screening, exclusion, and duplicates removal, we ended up with 225 papers. From there,we conducted the snowballing process and found 6 more papers. The final paper number for thisreview is 231. The number of papers per source are presented in Table 1.

Table 1. Number of selected publications per source

Sources ACM IEEE Springer ScienceDirect ArXiv Google Scholar Total

Paper count 22 74 17 4 106 8 231

2.2.1 Search String Definition. We used “Federated Learning” as the key term and included syn-onyms, abbreviations as supplementary terms to increase the search results. Then, we designed thesearch strings for each primary source to check the title, abstract, and keywords. After completingthe first draft of search strings, we examined the results of each search string against each databaseto check the appropriateness and effectiveness of the search strings. The finalised search terms areas follows:The search strings and the respective paper quantities of the initial search for each primary

source are shown in Table 3, 4, 5, 6, 7, and 8.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 5: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:5

Fig. 2. Paper search and selection process map

Table 2. Key and supplementary search terms

Key Term Supplementary Terms

Federated Learning Federated Machine Learning, Federated ML, Federated Artificial Intelligence, Federated AI,Federated Intelligence, Federated Training

Table 3. Search strings and quantity of ACM Digital Library

Search string

All: "federated learning"] OR [All: ,] OR [All: "federated machine learning"] OR [All: "federated ml"] OR [All: ,] OR [All: "federated intelligence"] OR [All: ,] OR [All: "federated training"] OR [All: ,]OR [All: "federated artificial intelligence"] OR [All: ,] OR [All: "federated ai"]AND [Publication Date: (01/01/2016 TO 01/31/2020)]

Result quantity 76Selected papers 22

Table 4. Search strings and quantity of IEEE Xplorer

Search string

("Document Title":"federated learning" OR "federated training" OR "federated intelligence" OR"federated machine learning" OR "federated ML" OR "federated artificial intelligence" OR"federated AI") OR ("Author Keywords":"federated learning" OR "federated training" OR"federated intelligence" OR "federated machine learning" "federated ML" OR "federatedartificial intelligence" OR "federated AI")

Result quantity 320Selected papers 71

2.2.2 Inclusion and Exclusion Criteria. The inclusion and exclusion criteria are formulated toeffectively select relevant papers. After completing the first draft of the criteria, we conducted apilot study using 20 selected papers. After that, two independent researchers cross-validated thepapers selected by the other researchers and refined the criteria. The finalised inclusion criteria areas follow:

• Both long and short papers that have elaborated on the component interactions of thefederated learning system: We specifically focus on research works that provide compre-hensive explanations of the federated learning components functionalities and their mutualinteractions.

• Survey, review, and SLR papers: We included all the surveys and review papers to understandthe current research directions, open problems and future insights. However, we do notconduct data extraction and data synthesis from these papers.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 6: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:6 Lo, et al.

Table 5. Search strings and quantity of ScienceDirect

Search string"federated learning" OR "federated intelligence" OR "federated training" OR "federated machinelearning" OR "federated ML" OR "federated artificial intelligence" OR "federated AI"

Result quantity 5Selected papers 4

Table 6. Search strings and quantity of Springer Link

Search string"federated learning" OR "federated intelligence" OR "federated training" OR "federated machinelearning" OR "federated ML" OR "federated artificial intelligence" OR "federated AI"

Result quantity 85Selected papers 17

Table 7. Search strings and quantity of Google scholar

Search string"federated learning" OR "federated intelligence" OR "federated training" OR "federated machinelearning" OR "federated ML" OR "federated artificial intelligence" OR "federated AI"

Result quantity 332Remark Search title only (Google scholar does not provide abstract & keyword search option)

Selected papers 8

Table 8. Search strings and quantity of ArXiv

Search string

order: -announced_date_first; size: 200; date_range: from 2016-01-01 to 2020-01-31 include_cross_list:True; terms: AND title=“federated learning” OR “federated intelligence” OR “federatedtraining” OR “federated machine learning” OR “federated ML” OR “federated artificialintelligence” OR “federated AI”; OR abstract=“federated learning” OR“federated intelligence” OR “federated training” OR “federated machine learning” OR “federatedML” OR “federated artificial intelligence” OR “federated AI”

Result quantity 256Remark Search title and abstract only (ArXiv does not provide keyword search option)

Selected papers 103

• ArXiv and Google scholar’s papers cited by peer-reviewed papers published in the selectedprimary sources.

The finalised exclusion criteria are as follow:• Papers that elaborate only low-level communication algorithms: The low-level communica-tion algorithms or protocols between hardware devices are not the focus of this work.

• Papers that focus only on pure gradient optimisation. We excluded the papers that purelyfocus on the gradient and algorithm optimisation research. Our work focuses on the multi-tierprocesses and interactions of the federated learning software components.

• Papers that are not in English.• Conference version of a study that has an extended journal version.• PhD dissertations, tutorials, editorials, magazines.

2.2.3 Quality Assessment. A quality assessment scheme is developed to evaluate the quality ofthe papers. There are 4 quality criteria (QC) used by all 3 authors to rate each paper. We usednumerical scores ranging from 1.00 (lowest) to 5.00 (highest) to rate each paper. The average scores

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 7: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:7

are calculated for each QC and the total score of all 4 QCs are obtained. We include papers thatscore greater than 1.00 to avoid missing out on studies that are insightful while maintaining thequality of the search pool. The QCs as follow:

• QC1: The citation rate. We identified this by checking the number of citations collected byeach paper on Google scholar.

• QC2: The methodology contribution. We identified the methodology contribution of thepaper by asking 2 questions: (1) Is this paper highly relevant to the research? (2) Can we finda clear methodology that addresses its main research questions and goals?

• QC3: The sufficient presentation of the findings: Are there any solid findings/results andhave a clear cut outcome? It is evaluated based on the availability of results and the qualityof findings.

• QC4: The discussion on future works. It is evaluated based on the availability of future worksmentioned.

2.2.4 Data Extraction and Synthesis. For the data extraction, we downloaded all the papers selectedfrom the primary sources. We then created a Google sheet to record all the essential information,including the title, source, year, paper type, venue, authors, affiliation, the number of citations ofthe paper, the score for all QCs, the answers for each RQ, and the research classification (refer toAppendix A.11). The following steps are followed to prevent data extraction bias:

• The two independent researchers conduct the data extraction of all the papers and crosscheck the classification and discussed any dispute or inconsistencies in the extracted data.

• For any unsolved dispute on the extracted data, the first two authors will try to reach anagreement. If an agreement cannot be met, the meta reviewer will be required to review thepaper and finalise the decision together with the two independent researchers.

• All the data is stored in the Google sheet for analysis by the independent researchers.

3 RESULTSIn this section, the data extraction results for each research question are summarised and analysed.We presented all our findings in the following sections.

Fig. 3. RQ 1: What is federated learning?

1Data extraction sheet, https://drive.google.com/file/d/10yYG8W1FW0qVQOru_kMyS86owuKnZPnz/view?usp=sharing

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 8: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:8 Lo, et al.

Table 9. Characteristics of federated learning

Category Characteristic Paper count

Training settings (82%)Training a model on multiple clientsOnly sending model updates to the central serverProducing a global model on the central server

20013363

Data distribution (11%) Decentralised data storageData generated locally

4017

Orchestration (3%) Training organised by a central server 13

Client types (3%)Cross-deviceCross-siloBoth

1113

Data partitioning (<1%) Horizontal federated learningVertical federated learning

11

3.1 RQ 1: What is federated learning?The first research question (RQ 1) is "What is federated learning?". To answer RQ 1, the definitionof federated learning reported by each study is recorded. The motivation of this question is to helpthe audience understand 1) what federated learning is, and 2) the perceptions of researchers onfederated learning. Fig. 3 is a word cloud that shows the frequency of the words that appear in theoriginal definition of federated learning in each study, which can roughly reflect the perception ofresearchers on federated learning. The most frequently appeared words include: distribute, local,device, share, client, update, privacy, aggregate, edge, etc. To answer RQ1 more accurately, weuse these five categories to classify the definitions of federation learning: (1) training settings, (2)data distribution, (3) orchestration, (4) client types, and (5) data partitioning, as understood by theresearchers, as shown in Table 9.First, in the training settings, building a decentralised or distributed learning process over

multiple clients is what most researchers conceived as federated learning. This can be observed asthe most frequently mentioned keyword in RQ1 is “training a model on multiple clients”. Someof the most frequently mentioned keywords that describe the training settings are “distributed”,“collaborative”, and “multiple parties/clients”. The other two characteristics are “only sending themodel updates to the central server” and “producing a global model on the central server”. Thisdescribes how a federated learning system performs model training in a distributed manner. Thisalso shows how researchers differentiate federated learning from conventional machine learningand distributed machine learning.

Secondly, federated learning can be explained in terms of the data distributions. The keywordsmentioned in the studies are “data generated locally” and “data stored decentralised”. Data usedin federated learning are collected and stored by the local client devices in different geographicallocations. Hence, it exhibit non-IID (non-Identically Independently Distributed) and unbalanceddata properties [74, 114]. Furthermore, the data is stored in a decentralised manner and is not sharedwith other clients to preserve data privacy. We will discuss more on the client data distribution inSection 3.4.

Thirdly, some researchers look at federated learning from the orchestration of training processstandpoint. In conventional federated learning systems, a central server is responsible for theorchestration and organisation of training processes. The tasks consist of initialisation of globalmodel parameters or gradients, distribution of parameters and gradients to the participating clientdevices, collection of trained local results, and the aggregation of the collected results to produce newglobal model. However, many researchers have concerns about the usage of a single central server

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 9: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:9

for orchestration as it could introduce a single-point-of-failure [111, 144]. Hence, decentralisedapproaches for the exchange of model updates are studied and the adoption of blockchains fordecentralised data governance is also introduced by some researchers [111, 144].Fourthly, we observe two types of federated learning in terms of client types and we classify

them as cross-device and cross-silo. Cross-device federated learning deals with a massive numberof smart devices, creating a large-scale distributed network to collaboratively train a model forthe same applications [74]. Some examples of the applications are mobile device keyboard wordsuggestions and human activity recognition. The interests are also being increasingly extendedto cross-silo applications where data sharing between organisations is prohibited. For instance,data of a hospital is prohibited from exposure to other hospitals due to data security regulations.To enable machine learning while tackling the data sharing limitation of different organisations,cross-silo federated learning was proposed to conduct local model training using the data in eachsilo, without sharing the data with other silos [74, 161].Lastly, for data partitioning, we found 3 variations from the literature: horizontal, vertical and

federated transfer learning. Horizontal federated learning, or sample-based federated learning,is used in scenarios where the datasets share the same feature space but different sample IDspace [104, 185]. Inversely, vertical federated learning, also known as feature-based federatedlearning, is applied to the cases where two or more datasets share the same sample ID space butdifferent feature space. Federated transfer learning is also a kind of data partitioning strategy forfederated learning that considers another type of data partitioning where two datasets only overlappartially in the sample space or the feature space. It relies on transfer learning techniques to developmodels that are applicable to both datasets [185].We summarised the findings above as below. We connected each keyword and grouped them

under the same definition. Finally, we arranged the definitions according to the frequency of thewords that appeared.

Findings from RQ 1: What is federated learning (FL)?Federated learning is a type of distributed machine learning to preserve data privacy. Federated learning systems relyon a central server or a decentralised manner to coordinate the model training process on multiple, distributed clientdevices, where the data are stored. The model training is performed locally on the client devices, without moving thedata out of the client devices.Researchers have mentioned several interesting settings in federated learning. (1) Centralised/decentralised federatedlearning, (2) cross-silo/device federated learning, (3) horizontal/vertical/transfer federated learning.

3.2 RQ 1.1: Why is federated learning adopted?The motivation of RQ 1.1 is to understand the advantages that federated learning has. We classifythe answers extracted based on the non-functional requirements of federated learning adoption(illustrated in Fig. 4). Data privacy and communication efficiency are the two main motivations toadopt federated learning. Data privacy is preserved in federated learning as no raw local data movesout of the device [20, 136]. Also, federated learning achieves higher communication efficiencyby exchanging only model parameters or gradient updates [74, 114, 185]. In addition, high dataprivacy and communication efficiency have contributed to the high scalability. Hence, more clientsare motivated to join the training process since their private data are well preserved and lowercommunication costs are required to exchange model updates [74, 114, 185].Statistical heterogeneity is defined as the data distribution that has data volume and class

distribution variance among devices (i.e. Non-IID). Essentially, the data are massively distributedacross a large number of client devices, each containing a small amount of data [120, 139, 200],

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 10: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:10 Lo, et al.

Fig. 4. RQ 1.1: Why federated learning is adopted?

with unbalanced data classes [120] that are not representative of the overall data distribution [65].When local models are trained independently on these devices, these models tend to be overfittedto their local data [98, 139, 200]. Hence, federated learning is adopted to collaboratively trains thelocal models to form a generalised global model. System heterogeneity is defined as the property ofdevices having heterogeneous resources (e.g, computation, communication, storage, and energy).Federated learning can tackle this issue by enabling local model training and only communicatesthe model updates, which reduce the bandwidth footprint and energy consumption [7, 36, 98, 120,137, 155, 167].

Another motivation is high computation efficiency. With large numbers of participating clientsand the increasing computation capability of clients, federated learning can have high modelperformance [12, 38, 114, 151, 207] and computation efficiency [11, 63, 200]. Data storage efficiencyis ensured by independent on-client training using locally generated data [22, 114, 147, 196, 210].

Findings from RQ 1.1: Why is federated learning adopted?Motivation for adoption: Data privacy and communication efficiency are the two main motivations. Only a smallnumber of studies adopt federated learning because of model performance. With large amounts of participating clients,federated learning is expected to achieve high model performance. However, the approach is still immature whendealing with non-IID and unbalanced data distribution.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 11: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:11

Table 10. Data Types and Applications Distribution

Data Types (RQ 1.3) Applications (RQ 1.2) Count

Graph data(5%)

Generalized Pareto Distribution parameter estimationIncumbent signal detection modelLinear model fittingNetwork pattern recognitionComputation resource managementWaveform classification

111811

Image data(49%)

Autonomous drivingHealthcare (Bladder contouring, whole-brain segmentation)Clothes type recognitionFacial recognitionHandwritten character/digit recognitionHuman action predictionImage processing (classification/defect detection)Location predictionPhenotyping systemContent recommendation

52142

10924112

Sequentialdata(4%)

Game AI modelNetwork pattern recognitionContent recommendationRobot system navigationSearch rank systemStackelberg competition model

211162

Structureddata(21%)

Air quality predictionHealthcareCredit card fraud detectionBankruptcy predictionContent recommendation (e-commerce)Energy consumption predictionEconomic prediction (financial/house price/income/loan/market)Human action predictionMulti-site semiconductor data fusionParticle collision detectionIndustrial production recommendationPublication dataset binary classificationQuality of Experience (QoE) predictionSearch rank systemSentiment analysisSystem anomaly detectionSong publishment year prediction

2253511114111111111

Text data(14%)

Customer satisfaction predictionKeyboard suggestion (search/word/emoji)Movie rating predictionOut-of-Vocabulary word learningSuicidal ideation detectionProduct review predictionResource management modelSentiment analysisSpam detectionSpeech recognitionText-to-Action decision modelContent recommendationWine quality prediction

12141111521111

Time-seriesdata(7%)

Air quality predictionAutomobile MPG predictionHealthcare (gestational weight gain / heart rate prediction)Energy prediction (consumption/demand/generation)Human action predictionLocation predictionNetwork anomaly detectionResource management modelSuperconductors critical temperature predictionVehicle type prediction

1123811211

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 12: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:12 Lo, et al.

3.3 RQ 1.2 What are the applications of federated learning? & RQ 1.3: What data is thefederated learning applications dealing with?

In RQ 1.2 and RQ 1.3, we intend to study the applications that are developed using federatedlearning and the types of data used in those applications. We group the types of applicationsaccording to the data types. The data types and applications are listed in Table 10. The most widelyused data types are image data, structured data, and text data. The most popular application isimage classification. In fact, MNIST2 is the most frequently used dataset. Both graph data andsequential data are not popularly used in federated learning due to their data characteristics (e.g.,data structure). We observed that the federated learning is widely adopted in the applications thatdeal with data that are privacy-sensitive, such as images, personal medical or financial data, andtext recorded by personal mobiles.

Findings from RQ 1.2: What are the applications of federated learning? &RQ 1.3: What data is the federated learning applications dealing with?

Applications and data: Federated learning is widely adopted in applications that deal with image data, structureddata, and text data. More studies are needed for IoT time-series data. Both graph data and sequential data are notpopularly used due to their data characteristics (e.g., non-linear data structure). Also, there is only a few production-levelapplication. Most of the applications are still proof-of-concept prototypes or simulated examples.

3.4 RQ 1.4: What is the client data distribution of the federated learning applications?Table 11 shows the client data distribution types of the studies. 24% of the studies have adoptednon-IID data for their applications or have addressed the non-IID issue in their work. 23% of thestudies have adopted IID data. 13% of the studies have compared the two types of client datadistributions (Both), whereas 40% of the studies have not specified which data distribution theyhave adopted (N/A). These studies neither focused nor considered the effect of data distributionon the federated model training performance. In the simulated non-IID data distribution settings,researchers mainly split the dataset by class and store each data class into different client devices(e.g., [31, 67, 68, 154, 204]), or by sorting the data according to classes before distributing themto each client device [115]. Furthermore, the data are distributed unevenly for each client devicefor non-IID settings [68, 81]. For use case evaluations, the non-IID data are generated or collectedby local devices, such as [42, 45, 58, 85, 135]. As for IID data distribution settings, the data arerandomised and distributed evenly to each client device (e.g., [13, 73, 115, 154, 204]).

Findings from RQ 1.4:What is the client data distribution of the federated learning applications?

Client data distribution: Client data distribution is important to the model performance of federated learning. Modelaggregation should consider the case when the distribution of the dataset on each client is different. Many studiesconducted on Non-IID issues in federated learning, particularly on extensions of the FedAvg algorithm for modelaggregation.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 13: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:13

Table 11. Client data distribution types of federated learning applications

Data distribution types Non-IID IID Both N/A

Percentages 24% 23% 13% 40%

Findings from RQ 2: What are the challenges of federated learning being addressed?Motivation vs. challenges: Most of the known motivations of federated learning also appear to be the most stud-ied federated learning limitations, including communication efficiency, system and statistical heterogeneity, modelperformance, and scalability. This reflects that federated learning is still an under-explored approach.

Fig. 5. Challenges of federated learning being addressed

3.5 RQ 2: What are the challenges of federated learning being addressed? & RQ 2.1:How many research activities have been conducted on federated learning?

The motivation of RQ 2 is to identify the challenges of federated learning that are addressed bythe studies. As shown in Fig. 5, we have grouped the answers into categories based on ISO/IEC25010 System and Software Quality model [2] and ISO/IEC 25012 Data Quality model [1]. We canobserve that the communication efficiency of federated learning received the most attention fromresearchers, followed by statistical heterogeneity and system heterogeneity.To further understand the popular research areas and how the research interests evolved over

time from 2016 to date, we cross-examined the results for RQ 2 and RQ 2.1 as illustrated in Fig. 6.Note that we included the results from 2016-2019 as we only searched studies up to 31.01.2020,where the results do not represent the trend of the entire year. We can see that the number ofstudies that covered communication efficiency, statistical heterogeneity, system heterogeneity, datasecurity, and client device security have all surged drastically in 2019 versus the years before.Although communicating model updates instead of the entire raw data reduces communica-

tion cost, federated learning systems still require multiple rounds of communications to reachconvergence. Therefore, ways to reduce communication rounds in the federated learning trainingprocesses are studied (e.g., [41, 115, 149]). Furthermore, cross-device federated learning needs toaccommodate a massive amount of client devices. Hence, the dropout of clients due to bandwidthlimitation may occur [40, 176]. These dropouts could negatively impact the outcome of federatedlearning systems in two ways: (i) reduces the amount of data available for training [40], (ii) increasesthe overall training time [112, 148]. In fact, most of the existing federated learning schemes cannotresolve the dropout problem other than abandoning any training round with an insufficient numberof clients [105].

2The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist/

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 14: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:14 Lo, et al.

Fig. 6. Research Area Trend

Federated learning is effective in dealing with statistical and system heterogeneity issues sincemodel training can be conducted locally on client devices, accommodating different data patterns,and devices with diverse computation resource, energy consumption and data storage capacity [36,98]. However, the approach is still immature as open questions still exist in handling the non-IIDand unbalanced data distribution while maintaining the model performance [36, 67, 159, 162, 192].The interests in data security (e.g., [19, 49]) and client device security (e.g., [35, 48, 172]) are

also significantly high. The data security in federated learning is the degree to which a systemensures data are accessible only by an authorised party [1]. While federated learning systemsrestrict raw data from leaving local devices, it is still possible to extract private information throughthe back-tracing of the gradients or model parameters. The studies under this category mostlyexpress their concerns about the possibility of information leakage from the local model gradientupdates. Client device security can be expressed as the degree of security against dishonest andmalicious client devices [74]. The existence of malicious devices in the training process couldpoison the overall model performance by disrupting the training process or not contributing thecorrect results to the central server. Furthermore, the misbehavior or failure of client devices, eitherintentionally or unintentionally, could reduce the system reliability.Client motivatability is discussed as an aspect to be explored (e.g., [80, 81, 198]) since model

performance relies greatly on the number of participating clients. More participating clients meansmore client data and computation resources are contributed to the model training process.

System reliability issues refer to the single-point-of-failure on the central server (e.g., [64, 88, 111]).In this aspect, concerns are mostly on the adversarial or byzantine attacks that target the centralserver. The system performance of federated learning systems is covered in some studies (e.g. [34,107, 175]), which includes the considerations on computation efficiency, energy usage efficiency,and storage efficiency. In resource-restricted environments, the efficient resource management offederated learning systems is more crucial to system performance.To improve system auditability, auditing mechanisms (e.g., [8, 73, 171]) are used to track the

client devices’ behavior, local model performance, system runtime performance, etc. Scalability isalso mentioned as one of the system limitations in federated learning (e.g., [136, 143, 144, 175]) andlastly, the model performance limitation is covered (e.g., [65, 70, 124]). The model performance offederated learning systems is highly dependent on the number of participants and the data volumeof each participating device in the training process. Moreover, there are also performance tradeoffsto be considered due to non-IID and unbalanced data distributions.To answer RQ 2.1, we further classify the papers according to a research classification criteria

proposed by Wieringa [173], which includes: (1) evaluation research, (2) philosophy paper, (3)

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 15: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:15

proposal of solution, and (4) validation research, to distinguish the focus of each research activity.Evaluation research is the investigation of a problem in software engineering practice. In general,research results in new knowledge of causal relationships among phenomena, such as by casestudy, field study, field experiment, survey, or in new knowledge of logical relationships amongpropositions (e.g., mathematics or logic). Philosophy papers are papers that propose a new wayof looking at things, for instance, a new conceptual framework. Proposal of solution is the typeof papers that propose solution techniques and argue for its relevance, but without a full-blownvalidation. Lastly, validation researches investigate the properties of a proposed solution that hasnot yet been implemented in the practice. The investigation uses a thorough, methodologicallysound research setup (e.g., experiments, simulation).The research classification result is presented in Table 12. We notice that the most conducted

state-of-the-art researches on federated learning are validation researches that propose solutionsto one or more technical problems, and are investigated through experiments or simulations. Theproposals of solutions are the second most conducted type of research activities, followed byevaluation researches. The least conducted research activities are philosophy papers that propose anew conceptual framework for federated learning.

Table 12. The research classification of the selected paper

ResearchClassification Evaluation research Philosophical paper Proposal of solution Validation

research

Paper count 13 12 25 183

Findings from RQ 2.1:How many research activities have been conducted on federated learning?

Research activities: The number of studies that cover communication efficiency, statistical and system heterogeneity,data security, and client device security surged drastically in 2019 versus the years before. The most conducted researchactivities are validation researches, followed by the proposal of solution, evaluation researches, and philosophy papers.

3.6 RQ 2.2: Who is leading the research in federated learning?The motivation of RQ 2.2 is to understand the research impact in the federated learning community.We also intend to help researchers to identify the state-of-the-art researches in federated learningby listing out the top affiliations. As shown in table 13, we listed out the top 10 affiliations by thenumber of papers published and the number of citations. Google, IBM, CMU, and WeBank appearin both the top-10 lists. From this table, we can identify the affiliations that made the most efforton federated learning and those that made the most impact on the research domain.

Findings from RQ 2.2: Who is leading the research in federated learning?Affiliations: Google, IBM, CMU, and WeBank appear in the top 10 affiliations list both by the number of papers andby the number of citations, which reflect that they made the most efforts on federated learning and also the mostimpact on the research domain.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 16: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:16 Lo, et al.

Table 13. Research Impact Analysis

Top 10 affiliations by number of papers Top 10 affiliations by number of citations

Rank Affiliations Paper count Rank Affiliations No. of citations1 Google 21 1 Google 22692 IBM 11 2 Stanford University 2173 WeBank 8 3 ETH Zurich 1303 Nanyang Technological University 8 4 IBM 1225 Tsinghua University 6 5 Cornell University 1015 Carnegie Mellon University (CMU) 6 6 Carnegie Mellon University (CMU) 987 Beijing University of Posts and Telecommunications 5 7 ARM 827 Kyung Hee University 5 8 Tianjin University 769 Chinese Academy of Sciences 4 9 University of Oulu 739 Imperial College London 4 10 WeBank 66

3.7 RQ 3: How are the federated learning challenges being addressed?After looking at the challenges of federated learning, we studied the approaches proposed bythe researchers to address the challenges. We listed out the existing approaches in Fig. 7: modelaggregation (63), training management (24), incentive mechanisms (18), privacy-preserving mecha-nisms (16), decentralised aggregation (15), security management (12), resource management (13),communication coordination (8), data augmentation (7), data provenance (7), model compression(8), feature fusion/selection (4), auditing mechanisms (4), evaluation (4), and anomaly detection (3).We also mapped the approaches to the challenges of federated learning in Table 14. Notice thatwe did not cover every challenge mentioned in the collected studies but only those that have aproposed solution.

As shown in Fig. 7, model aggregation mechanisms are the most proposed solution by the studies.By referring to Table 14, we can see that model aggregation mechanisms are applied to addresscommunication efficiency, statistical heterogeneity, system heterogeneity, client device security,data security, system performance, scalability, model performance, system auditability, and systemreliability issues.Researchers have proposed various kinds of aggregation methods, including selective aggre-

gation [77, 192], aggregation scheduling [154, 181, 182], asynchronous aggregation [29, 175],temporally weighted aggregation [122], controlled averaging algorithms [78], iterative roundreduction [115, 149], and shuffled model aggregation [50]. These approaches aim to: (1) reducecommunication cost and latency for better communication efficiency and scalability; (2) manage thedevice computation and energy resources for system heterogeneity and system performance issues,and (3) select models for aggregation based on the model performance. Some researchers haveproposed secure aggregation to solve data security and client device security issues [17, 18, 20].Decentralised aggregation is a type of model aggregations which removes the central server

from the federated learning systems and is mainly adopted to solve system reliability limitations.The decentralised aggregation can be realised through a peer-to-peer system [139, 145], one-hopneighbours collective learning [86], and Online Push-Sum (OPS) methods [60].

Training management is the second most proposed approach in the studies. It is used to addressstatistical heterogeneity, system heterogeneity, communication efficiency, client motivatability,and client device security issues. To deal with the statistical heterogeneity issue, researchers haveproposed various training methods, such as the clustering of training data [23, 142], multi-stagetraining and fine-tuning of models [72], brokered learning [47], distributed multitask learning [36,150], and edge computing [102]. The aim is to reduce data skewness and effect of unbalanced databy combining data from nearby nodes while maintaining a distributed manner. Furthermore, to

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 17: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:17

address system heterogeneity issues, some methods calculate and balance the tasks and resourcesof the client nodes for the training process [39, 54, 90].

Another highly proposed approach is the incentive mechanism, which is proposed as a solutionto increase client motivatability. There is no obligation for data or device owners to join the trainingprocess if there are no benefits for them to contribute their data and resources. Hence, incentivemechanisms are introduced by some researchers to attract data and device owners to join the modeltraining. Incentives can be given based on the amount of computation, communication, and energyresources provided [81, 166], the local model performance [108], the quality of data provided [196],the honest behavior of client devices [131], and the dropout rate of a client node [126]. The proposedincentive mechanisms can be hosted by either a central server [76, 198] or a blockchain [80, 106].

Resource management is introduced to control and optimize computation resources, bandwidth,and energy consumption of participating devices in a training process to address the issues ofsystem heterogeneity [122, 169] and communication efficiency [4, 142]. The proposed approachesuse control algorithms [169], reinforcement learning [119, 195], and edge computing methods [122]to optimise the resource usage and improve the system efficiency for system heterogeneous andbandwidth limited scenarios. To address communication efficiency, model compression methodsare utilised to further reduce the communication cost that occurs during the model parameters andgradients exchange between clients and the central server [22, 38, 83, 91, 143, 168]. In addition, modelcompression that shrinks the data size is also beneficial to bandwidth limited and latency-sensitivescenarios, which promotes scalability [143].

Privacy preservation methods are introduced to (1) maintain data security: prevent information(model parameters or gradients) leakage to unauthorised parties, and (2) maintain client devicessecurity: prevent system from being compromised due to dishonest nodes. Some methods usedfor data security maintenance are differential privacy method [140, 179], such as gaussian noiseaddition to the features before model updates [110]. For client device security maintenance, securemultiparty computations method such as homomorphic encryption is used together with localdifferential privacy method [140, 159, 179]. While homomorphic encryption only allows the centralserver to compute the global model homomorphically based on the encrypted local updates, thelocal differential privacy method protects client data privacy by adding noise to model parameterdata sent by each client. [159]. Security protection method is also proposed to solve the issuesfor both client device security [101, 103, 176] and data security [56, 200], which covers modelparameters or gradients encryption prior to information exchange between server and client nodes.

Communication coordination methods are introduced to improve communication efficiency [66,184] , and system performance [34]. Specifically, communication techniques such as multi-channelrandom access communication [34] or over-the-air computation method [6, 184] enable wirelessfederated training process to achieve convergence faster.

To address the model performance issues [65], statistical heterogeneity problems [162], systemauditability limitations [165] and communication efficiency limitations [191], feature fusion orselection methods are used. Feature selection methods are used to improve the convergencerate [65] by only aggregating models with the selected features. Feature fusion is used to reducethe impact of non-IID data on the model performance [162]. Moreover, the feature fusion methodreduces the dimension of data to be transferred to speed up the training process and increasesthe communication efficiency of the system [191]. In addition, [165] proposed a feature separationmethod that measures the importance level of the feature in the training process. It is used forsystem interpretation and auditing purposes.

Data provenance mechanisms are used to govern the data and information interactions betweennodes. They are effective in solving single-point-of-failure and adversarial attack that intends totamper the data, especially in decentralised FL systems. One way to implement a data provenance

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 18: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:18 Lo, et al.

Fig. 7. RQ 3: How are the federated learning challenges being addressed?

mechanism in a federated learning system is through blockchain. Blockchain records all the sharingevents instead of the raw data for audit and data tracing purposes, and only allow parties withpermission to access the information [106, 193]. Furthermore, blockchain is used for incentiveprovision in [14, 80, 81, 172], and to store global model updates in a Merkle tree [111].

Data augmentation methods are introduced to address data security [129, 156] and client devicesecurity issues [125]. The approaches use privacy-preserving generative adversarial network(GAN) models to locally reproduce the data samples of all devices [69] for data release [156]and debugging processes [10]. These methods provide extra protection to shield the actual datafrom exposure. Furthermore, data augmentation methods are also effective to solve statisticalheterogeneity issues through the reproduction of all local data to create an IID dataset for bettertraining performance [69].Auditing mechanisms are proposed as a solution for the lack of system auditability and client

device security limitations. They are responsible to assess the honest or semi-honest behavior of aclient node in a training process and detect any anomaly occurrence [10, 81, 203]. Some researchershave also introduced anomaly detectionmechanisms [48, 95, 131], specifically to penalise adversarialnodes that misbehave in an FL system.

To properly assess the behavior of the federated learning system, some researchers have imple-mented the evaluation mechanisms. The mechanisms are introduced as methods to solve the systemauditability and statistical heterogeneity issues. For instance, in [171], a visualisation platform toillustrate the federated learning system is presented. In [21], a benchmarking platform is presentedto realistically capture the characteristics of a federated scenario. Furthermore, [62] introduces amethod to measure the performance of the federated averaging algorithm.

Findings from RQ 3: How are the federated learning challenges being addressed?Top 5 proposed approaches: The top 5 proposed approaches are model aggregation, training management, incentivemechanisms, privacy-preserving methods, and resource management. These approaches mainly target to solve ofcommunication efficiency, statistical and system heterogeneity, client motivatability, and data privacy issues.Remaining proposed approaches: A few papers worked on anomaly detection, auditing mechanisms, feature fu-sion/selection, evaluation, data provenance.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 19: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:19

Table14.Wha

tarethechalleng

esof

federatedlearning

beingad

dressed(RQ2)

vs.H

owarethefederatedlearning

challeng

esbeingad

dressed(RQ3)

Limitations

vsApp

roache

sMod

elpe

rforman

ceScalab

ility

Statistical

heteroge

neity

System

auditability

System

heteroge

neity

Clien

tmotivatab

ility

System

reliab

ility

Com

mun

ication

efficien

cySy

stem

performan

ceData

secu

rity

Clien

tdev

ice

secu

rity

Mod

elag

greg

ation

[70]

[175],[136]

[161],[192],[3

7],

[117],[175],[2

9],

[102],[114],[1

54],

[128],[97],[17],

[78]

[8]

[192],[41],[117],

[29],[1

96],[97],

[18],[1

74],[123],

[17]

-[1

11]

[208],[32],[115],[1

32],[149],

[209],[182],[4

1],[1

04],[183],

[181],[175],[2

9],[1

14],[69],

[154],[20],[136],[7

3],[5

2],

[18],[1

7],[1

50],[84]

[175],[29],

[73]

[133],[18],

[50]

[77],[1

30],

[61],[3

0]

Trainingman

agem

ent

--

[153],[67],[23],

[142],[39],[83],

[90],[7

2],[5

1],

[55],[3

6],[1

50],

[54],[2

5]

-[28],[3

9],[9

0],

[54]

[47]

-[190],[178],[1

99],

[6],

--

[153]

Incentivemecha

nism

s-

-[198]

-[46],[1

41],

[79]

[198],[81],[80],

[172],[14],[76],

[75],[4

6],[1

13],

[79],[1

66]

-[44],[1

27],

[126]

--

-

Privacy

preserva

tion

--

--

--

--

-

[107],[179],[1

1],

[110],[153],[1

16],

[140],[49],[92],

[157],[16],[170]

[159],[71],

[199],[35]

Decen

tralised

aggreg

ation

-[144]

--

[60]

-[80],[1

44],[87],

[88],[6

4][6

0][107]

[186]

[145],[139],

[186],[86],

[60]

Resou

rce

man

agem

ent

--

--

[169],[122],

[9],[180],

[195],[189],

[98],[7

],[138],[197],

[93]

--

[142],[4]

--

-

Secu

rity

protection

--

--

--

--

-[200],[56],[57],

[176],[112],[1

05],

[59],[2

4],[3

3]

[176],[101],

[103]

Com

mun

ication

coordina

tion

--

--

--

-[184],[66],[5],

[27],[1

63],[148],

[6]

[34]

--

Data

augm

entation

[124]

-[42],[6

9]-

--

-[204]

-[156],[129]

[125],[203]

Mod

elco

mpr

ession

-[143]

-[19]

--

-[168],[38],

[143],[22],

[83],[9

1]-

--

Datapr

oven

ance

--

--

--

--

-[106],[111],[1

93]

[172],[108],

[210],[205]

Evalua

tion

--

[21],[6

2][171]

[21]

--

--

--

Featur

efusion

/selection

[65]

-[162]

[165]

--

-[1

91]

--

-

Aud

iting

mecha

nism

s-

--

[14],[1

0]-

--

--

-[14]

Ano

malyde

tection

--

--

--

--

--

[131],[95],

[48]

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 20: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:20 Lo, et al.

Table 15. Summary of component designs

Sub-component Client-based Server-based Both

Anomaly detector - 3 -Auditing mechanism - 2 -Data provenance 7 14 -Client selector - 1 -Communication coordinator - 6 -Data augmentation 4 3 -Encryption mechanism 3 - 12Feature fusion mechanism 6 - -Incentive mechanism 2 9 -Model aggregator - 188 -Model compressor - - 6Model trainer 211 - -Privacy preservation mechanism 1 - 14Resource manager 2 9 -Training manager 1 9 -

3.8 RQ 3.1: What are the possible components of federated learning systems?The motivation of RQ 3.1 is to identify the components in a federated learning system and theirresponsibilities. We classify the components of federated learning systems into 2 main categories:(1) central server, (2) client devices. The two components are the mandatory and optional of aconventional federated learning system [83, 114]. A central server initiates and orchestrate thetraining process, whereas the client devices perform the actual model training.Apart from the central server and client devices, some studies also added edge device to the

system as an intermediate hardware layer between the central server and client devices. Theedge device is introduced to increase the communication efficiency by reducing the training datasample size [73, 132] and increases the update speed [37, 102]. Table 15 summarises the number ofmentions of each component. We classify them into client-based, server-based, or both to identifythe host of these components. We can see that model trainer is the most mentioned component forclient-based components, and model aggregator is mentioned the most in server-based components.Furthermore, we notice that software components that manage the system (e.g., anomaly detector,data provenance, communication coordinator, resource manager & client selector) are mostlyserver-based while software components that enhance the model performance (e.g., feature fusion,data augmentation) are mostly client-based. Lastly, two-way operating software components suchas model encryption, privacy preservation, and model compressor exist in both clients and theserver.

3.8.1 Central server: Physically, the central servers are usually hosted on a local machine [89, 194],cloud server [57, 155, 178], mobile edge computing platform [122], edge gateway [147, 207] or basestation [9, 66, 107, 184]. Most studies propose the central server as a hardware component thatcreates the first global model by randomly initialises the model parameters or gradient[27, 56, 89,114, 115, 201]. Besides randomising the initial model, central servers can pre-train the global model,either using a self-generated sample dataset or a small amount of data collected from each clientdevice [132, 133]. We can define this as the server-initialised model training setting. Note that notall federated learning systems initialise their global model on the central server. In decentralisedfederated learning, clients initialise the global model [14, 111, 144].After the global model initialisation stage, the central servers broadcast the global model that

includes the model parameters and gradient to the participating client devices. The global model canbe broadcasted to all the participating client devices every round [5, 66, 143, 208], or only to specificclient devices, either randomly [32, 89, 115, 160, 184] or through selection based on the model

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 21: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:21

training performance [132, 180] and the resources availability [38, 169]. Similarly, the trained localmodels are also collected from either all the participating client devices [168, 169, 200] or only fromselected client devices [19, 112]. The collection of models can either be in an asynchronous [29, 65,100, 175] or synchronous manner [42, 48]. Finally, the central server performs model aggregationswhen it receives all or a specific amount of updates, followed by the redistribution of the updatedglobal model to the client devices. This entire process continues until convergence is achieved.Apart from orchestrating the model parameters and gradients exchange, the server also hosts

other essential software components, such as encryption/decryption mechanisms for model en-cryption and decryption [11, 19, 112], and resource management mechanisms to optimise theresource consumptions [9, 122]. Evaluation framework is proposed to evaluate the model andsystem performance [21, 62, 171], while the client and model selector is introduced to select ap-propriate clients for model training and select models for global aggregation [38, 132, 169, 180].Feature fusion mechanisms are proposed to combine essential model features and reduces com-munication cost [65, 162, 165, 191]. Incentive mechanisms are utilised to motivate the clients’participation rate [46, 79, 126, 127, 141, 166, 198]. An anomaly detector is introduced to de-tect system anomaly [48], while the model compressor compressed the model to reduce thecommunication cost [22, 38, 83, 143, 168]. Communication coordinator is a software compo-nent that manages the multi-channel communication between the central server and client de-vices [5, 6, 27, 66, 148, 163, 184]. Lastly, auditing mechanisms is in charge of auditing the trainingprocesses [10, 14, 14, 203].

3.8.2 Client devices: The client devices are the hardware component that is responsible for con-ducting the model training using the locally available datasets. Firstly, each client devices collectand pre-process the local data (data cleaning, labeling, feature extraction, etc). All client deviceswill receive the initial global model. When the clients received the global model from the centralserver, each client decrypts and extracts the global model parameters. After that, the client devicesperform local model training. The received global model is used for data inferencing and predictionby the client devices.

The local model training minimises the loss function and optimises the local model performance.Typically, the training process is conducted multiple rounds and epochs [114], before uploading theresults back to the central server for model aggregations. To reduce the number of communicationrounds, [52] proposed to perform local training on multiple local mini-batch of data. The methodonly communicate with the central server after the model achieved convergence.After that, the client devices send the training results (model parameters or gradients) back to

the central server. Before the uploading, client devices evaluate the local model performance andonly upload when an agreed level of performance is achieved [62]. The results are encrypted usingthe encryption mechanism prior uploading to preserve the data security and prevent informationleakage [11, 19, 112]. Furthermore, the model or result is compressed before uploaded to the centralserver to reduce communication cost [22, 38, 83, 143, 168]. In certain scenarios, not all devices arerequired to upload their results. Only the selected client devices are required to upload the results,depending on the selection criteria set by the central server. The criteria evaluates the availableresource of the client devices and the model performance [9, 122, 195]. Other than that, clientdevices also host data augmentation mechanism [69, 156], and feature fusion mechanisms thatcorrelate to the central server in the same FL systems. After the completion of one model traininground, the training result is uploaded to the central server for global aggregation.

For decentralised federated learning systems, the client devices communicate among themselveswithout the orchestration of the central server. The removed central server is mostly replaced bya blockchain as a software component for model and information provenance [11, 14, 75, 80, 81,

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 22: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:22 Lo, et al.

106, 111, 172]. The blockchain is also responsible for incentive provision and differentially privatemultiparty data model sharing. The initial model is created locally by each client device usinglocal datasets. The models are updated using a consensus-based approach that enables devices tosend model updates and receive gradients from neighbour nodes [86, 88, 144]. The communicationbetween client devices is realised through a peer-to-peer network [64, 139, 145]. Each device hasthe model update copies of all the client devices. After reaching consensus, all the client deviceswill conduct model training using the new gradient.

In cross-device settings, the system is consists of a high client device population where eachdevice is the data owner. In contrast, in the cross-silo setting, the client network is formed byseveral companies or organisations, regardless of the number of individual devices they own, wherethe number of data silos is significantly smaller compared to cross-device setting [74]. Therefore,the cross-device system is required to create models for large-scale distributed data on the sameapplication [161]. For the cross-silo system, the created models are required to accommodate datathat is heterogeneous in content and semantic in both features and sample space [161]. The datapartitioning is also different. For the cross-device setting, the data is partitioned automatically byexample (horizontal data partitioning). The data partitioning of cross-silo setting is fixed either byfeature (vertical) or by example (horizontal) [74].

Findings from RQ 3.1: What are the possible components of federated learning?Mandatory components on clients: data collection, data preprocessing, feature engineering, model training, and infer-ence.Mandatory components on central server: model aggregation, evaluation.Optional components on clients: anomaly detection, model compression, auditing mechanisms, data augmentation,feature fusion/selection, security protection, privacy preservation, data provenance.Optional components on central server: advanced model aggregation, training management, incentive mechanism,resource management, communication coordination.*Mandatory components - Components that perform the main federated learning operations.*Optional components - Components that assist/enhance the federated learning operations.

3.8.3 RQ 3.2: What phases of the machine learning pipeline are covered by the research activities?Table 16 presents a summary of machine learning pipeline phases covered by the studies. Notice thatonly phases that are specifically elaborated in the papers are included. The top 3 most mentionedmachine learning phases are "model training" (161 mentions), followed by "data collection" (22mentions) and "data cleaning" (18 mentions). These 3 stages of federated learning mentionedare mostly similar to the approaches in conventional machine learning systems, with some keydifferences, such as distributed model training tasks, decentralised data storage, and non-IID datadistribution. Notice that only Google mentioned model inference and deployment, specifically forGoogle keyboard application. The on-device inference supported by TensorFlow Lite is brieflymentioned in [58, 135], and [187] mentioned a model checkpoint from the server is used to buildand deploy the on-device inference model that uses the same featurisation flow which originallylogged training examples on-device. Model monitoring (e.g., dealing with performance degradation)and project management (e.g., model versioning) are not discussed in the existing studies. Weunderstand that federated learning research is still in an early stage as most researchers focused onthe data-processing and model training optimisation.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 23: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:23

Findings from RQ 3.2:What phases of the machine learning pipeline are covered by the research activities?

Phases: The most discussed phase is model training. Only a few studies cover data pre-processing, feature engineering,model evaluation, and only Google has discussions about model deployment (e.g., deployment strategies) and modelinference. Model monitoring (e.g., dealing with performance degradation), and project management (e.g., modelversioning) are not discussed in the existing studies. More studies are needed for the development of production-levelfederated learning systems.

Table 16. Summary of machine learning pipeline phases

MLpipeline

Datacollection

Datacleaning

Datalabelling

Dataaugmentation

Featureengineering

Modeltraining

Modelevaluation

Modeldeployment

Modelinference

Papercount 22 18 13 9 8 161 10 1 2

3.9 RQ 4: How are the approaches evaluated?RQ 4 focuses on the evaluation approaches in the studies. We classify the evaluation approaches intotwo main groups: (1) simulation, and (2) case study. For simulation approaches, image processingis the most simulated task, with 99 out of 197 cases, whereas, the most applied use cases areapplications on mobile devices (11 out of 17 cases), such as keyword word suggestion and humanactivity recognition.

Table 17. Evaluation approaches for federated learning

Evaluation methods Application types Paper count

Simulation (85%) Image processingOthers

9998

Case study (7%)Mobile device applicationsHealthcareOthers

1133

Both (1%) Image processingMobile device applications

11

No evaluation (7%) - 15

Findings from RQ 4: How are the approaches evaluated?Evaluation: Researchers mostly evaluate their federated learning approaches by simulation using privacy-sensitivescenarios. There are only a few real-world case studies, e.g., Google’s mobile keyboard prediction.

3.9.1 RQ 4.1: What are the evaluation metrics used to evaluate the approaches? Through RQ 4.1,we intend to identify the evaluation metrics for both the qualitative and quantitative evaluationmethods that are mentioned and adopted. We also elaborate on how each evaluation metric iscalculated. Finally, we map the metrics to the federated learning challenges (RQ 2). The results aresummarised in Table 18.For the attack rate, it is measured as the proportion of attack targets that are incorrectly clas-

sified as the target label [48]. Essentially, researchers use this metric to evaluate how effectivethe defending mechanisms are [47, 118, 145, 203]. The types of attack mentioned include modelpoisoning attack, sybil attack, byzantine attack, and data reconstruction attack.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 24: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:24 Lo, et al.

Communication cost is quantified by the communication rounds against the learning accuracy [32,34, 104, 108, 115, 152, 168, 182], the satisfying rate of communications [201], the communicationoverhead versus the number of clients [57, 183] , the theoretical analysis of communication cost ofdata interchange between the server and clients [19, 151, 158], the data transmission rate [176, 179],bandwidth [134, 195], and latency of communication [80].Computation cost is the assessment of computation, storage, and energy resource of a system.

The computation resource is quantified by the computation overhead against the number of clientdevices [56, 116, 123, 195], the average computation time [73, 93, 189], the computation utilityand the overhead of components [25, 198]. The storage resources are evaluated by the memoryand storage overhead [42, 112, 123], storage capacity [134], computation throughput [9, 172] andcomputation latency [80]. The energy resources are calculated by the energy consumption forcommunication [46, 102], the energy consumption against computation time [93, 102, 147, 180, 189],and the energy consumption against training dataset size [211].The convergence rate and dropout ratios are used to evaluate system performance. The con-

vergence rate is quantified by the accuracy versus the communication rounds, system runningtime, or training data epochs [15, 141, 191]. The dropout ratios measured by the computationoverhead against dropout ratios [123, 176]. The results are showcased as the comparison betweencommunication overhead for different dropout rates [176], and the performance comparison againstdropout rate [29, 97, 105].The incentive rate is assessed by calculating the profit of the task publisher under different

numbers of clients or accuracy levels [75], the average reward based on themodel performance [206],and the relationship between the offered reward rate and the local accuracy over the communicationcost [79, 126, 127]. From the studies collected, there is no mention of any specific form of rewardsprovided as incentives for federated learning. However, cryptocurrencies such as Bitcoin or tokensthat can be converted to actual money are some common kind of rewards provided as incentives.For model performance, the evaluation metrics are consist of the training loss [25, 104, 152],

area under curve (AUC) value [104, 106, 186], F1-score [13, 29, 43], root mean-squared error(RMSE) [26, 149], cross-entropy [45, 175], precision [43, 188, 202], recall [43, 188, 202], predictionerror [84, 150], mean absolute error [65], dice coefficient [146], and perplexity value [142]. Privacyloss is used to evaluate the privacy-preserving level of the proposed method [206], which arederived from the differential average-case privacy [156].For the system running time evaluations, the results are presented as the total execution time

of the training protocol (computation time + communication time) [105, 106, 118, 176, 200], therunning time of different operation phases [200], the running time of each round against numberof client devices [19, 122, 131], and the model training time [172, 179].Apart from quantitative metrics, qualitative analysis methods are also used to evaluate data

security [16, 50, 57], statistical [78, 117, 175] and system heterogeneity [117]. Analyses are conductedto verify if the proposed components, mechanisms, or functionalities have satisfied their purposesto address the limitations mentioned. Essentially, the data security analyses that are mentionedinclude differential privacy achievement [106, 107], removal of centralised trust [106], and theguarantee of shared data quality [106]. The client device security analyses mentioned are theperformance of encryption and verification process [103, 176, 186], the confidentiality guaranteefor gradients, the auditability of gradient collection and update, and the fairness guarantee formodel training [172].

3.10 SummaryWe have presented all the findings and results extracted through each RQ. We summarise all thefindings in a mind-map, as shown in Fig. 9.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 25: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:25

Table 18. Performance attributes addressed vs Evaluation metrics

Performanceattributes

VSEvaluationmetrics

Clientmotivatability

Communicationefficiency

Modelperformance Scalability System

performanceData

securityStatistical

heterogeneitySystem

heterogeneity

Clientdevicesecurity

Attack rate - - - - 1 1 - - 4Communication cost - 58 - 3 2 - - - -Computationcost - - - - 42 - - - -

Convergence rate - - - - 15 - - - -Dropout ratio - 1 - - 2 1 - 2 -Incentive rate 6 - - - - - - - -Model performancemetrics 2 1 253 - 1 - - - -

Privacy loss - - - - - 2 - - -System runningtime - 2 - 2 20 - - - -

Qualitative evaluation - - - - - 7 3 1 4

Findings from RQ 4.1: What are the evaluation metrics used to evaluate the approaches?Evaluation: Researchers have performed quantitative and qualitative analysis to evaluate the federated learningsystem.Quantitative metrics examples: Model performance, communication & computation cost, system running time, etc.Qualitative metrics examples: Data security analysis on differential privacy achievement, Performance of encryptionand verification process, confidentiality guarantee for gradients, the auditability of gradient collection and update, etc.

Fig. 8. Open problems and future research trends highlighted by existing reviews/surveys

4 OPEN PROBLEMS AND FUTURE TRENDS IN FEDERATED LEARNINGIn this section, we identified and discussed the open problems and future research trends based onall the survey and review papers we collected to provide objective and bias-free analyses (referTable 19). The findings are shown in Fig. 8 and the detailed explanations are elaborated below:

• Enterprise and industrial-level implementation/adoption. As much as we intend topresent industry and enterprise-related federated learning, the only mentions of such topicare the cross-silo federated learning and some of the possible federated learning applicationson real-world use cases (refer to table 10). The reason is that federated learning studiesare still at an early stage [74, 94, 96, 99, 177]. Hence, the implementation and adoption ofenterprise and industrial-level federated learning is an open problem and future research

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 26: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:26 Lo, et al.

Federatedlearning system

development

Definition

Challenges

Approachesto addresschallenges

Evaluation

Machine learningpipeline

Trainingsettings

Datadistribution

Client types

Datapartitioning

Central orchestration

Model training onmultiple clients

Local datageneration

Decentraliseddata storage

No raw dataexchange

Horizontal

Vertical

Transferlearning

Cross-device

Cross-silo

Techniqueunderstanding

Adoptionreasons

Dataprivacy

Communicationefficiency

Scalability

Statisticalheterogeneity

Systemheterogeneity

Modelperformance

Computationalefficiency

Data storagelimitation

Application and datatypes

Privacy-sensitive applications:e.g. image processing, word

recommendations, etc

Model performance

Communication efficiency

System heterogeneity

Statistical heterogeneity

Data security

Client device security

Client motivatability

System reliability

System performance

System auditability

Scalability

Model aggregation

Training management

Incentive mechanism

Privacy-preservation

Decentralised aggregation

Resource management

Security protection

Communication coordination

Data augmentation

Model compression

Data provenance

Evaluation

Feature fusion/selection

Auditing mechanism

Anomaly detection

Requirementanalysis

Applications

Data types

Datadistribution

I.I.D

Non I.I.D

Privacy-sensitive data:Images, text, etc.

Privacy-sensitive applications:Image recognition, word

prediction, etc.

Architecturedesign

Components

Centralserver

Clientdevice

Mandatory

Optional

Mandatory

Optional

Model aggregation

Evaluation

Advanced model aggregation

Training management

Incentive mechanism

Resource management

Communication coordination

Data collection

Data pre-processing

Feature engineering

Model training

Model inference

Anomaly detection

Model compression

Auditing mechanism

Data augmentation

Feature fusion/selection

Security protection

Privacy-preservation

Data provenance

Quantitativemetrics

Qualitativemetrics

Attack rate

Computationcost

Communicationcost

Dropout ratio

Convergencerate

Incentiverate

Model performancemetrics

Privacy loss

System runningtime

Data securityanalysis

Client device securityanalysis

Data pre-processing

Data augmentation

Feature engineering

Model training

Model evaluation

Fig. 9. Mind-map summary of the findings

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 27: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:27

trend of this field. The possible challenges of enterprise and industrial-level federated learningimplementation are as follow:– Machine learning pipeline: Most of the existing studies in federated learning focus onthe federated model training phase, without considering other machine learning pipelinestages (e.g., model deployment and monitoring) under the context of federated learning(e.g., new data lifecycles) [94, 96, 99, 177].

– Benchmark: Benchmarking schemes are needed to evaluate the system developmentunder real-world settings, with rich datasets and representative workload [74, 94, 96].

– Dealing with unlabeled client data: In practice, the data generated on clients maybe mislabeled or unlabeled [74, 94, 96, 99]. Some possible solutions are semi-supervisedlearning-based techniques, label the client data by learning the data label from other clients.However, these solutions may require to deal with the data privacy, heterogeneity, andscalability issues.

– Software architecture: Federated learning still lacks systematic architecture design toguide the methodical development. Systematic architecture can provide design or algorithmalternatives for different system settings [94].

– Regulatory compliance: The regulatory compliance for federated learning systems isstill under-explored [94] (e.g., whether data transfer limitations in GDPR is applied tomodel update transfer, ways to execute right-to-explainability to the global model, andwhether the global model should be retrained if a client wants to quit). Machine learningcommunities and law enforcement communities are expected to cooperate to fill the gapbetween federated learning technology and the regulations in reality.

– Human-in-the-loop: Domain experts are expected to be involved in the federated learningprocess to provide professional guidance as end-users tend to trust the expert’s judgementmore than the inference by machine learning algorithms. [177].

• Advanced privacy preservation methods: More advanced privacy preservation methodsare needed to be studied as the existing solutions still can reveal sensitive information [74,94, 96, 99, 109, 121].– Tradeoffs between data privacy and model system performance: The current ap-proaches (e.g., differential privacy and collaborative training) sacrifice the model perfor-mance, and require significant computation cost [99, 109, 121]. Hence, the design of feder-ated learning systems needs to balance the tradeoffs between data privacy andmodel/systemperformance.

– Granular privacy protection: In reality, privacy requirements may differ across clientsor sample data on a single client. Therefore, it is necessary to protect data privacy in a moregranular manner. Privacy heterogeneity should be considered in the design of federatedlearning systems for different privacy requirements (e.g., client device-specific or sampledata specific). One direction of future work is extending differential privacy with granularprivacy restriction (e.g., heterogeneous differential privacy) [94, 96, 109].

– Sharing of less sensitive model data: Devices should only share less sensitive modeldata (e.g., inference results or signSGD), and this type of approaches may be considered infuture work [74, 109].

• Improving system and model performance. There are still some performance issuesregarding federated learning systems, mainly on resource allocation (e.g., communication,computation, and energy efficiency). Moreover, model performance improvement throughthe non-algorithm or non-gradient optimisation approach (e.g., promoting more participants,the extension of the federated model training method) is also another future research trend.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 28: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:28 Lo, et al.

– Handling of client dropouts: In practice, participating clients may drop out off thetraining process due to energy constraints or network connectivity issues [74, 94, 99].Large number of client dropout can significantly degrade the model performance. Many ofthe existing studies do not consider the situation when the number of participating clientschanges (i.e., departures or entries of clients). Hence, the design of a federated learningsystem should support the dynamic scheduling of model updates to tolerate client dropouts.Furthermore, new algorithms are needed to deal with the scenarios where only a smallnumber of clients are left in the training round. The learning coordinator should providestable network connections to the clients to avoid dropouts.

– Advanced incentive mechanisms: Without a well-designed incentive mechanism, po-tential clients are reluctant to join the training process, which will discourage the adoptionof federated learning [74, 94, 177]. In most of the current design, the model owner pays forthe participating clients based on some metrics (e.g., number of participating rounds ordata size), which might not be effective in evaluating the incentive provision.

– Model market: A possible solution proposed to promote federated learning applicationsis the model market [94]. One can perform model prediction using the model purchasedfrom the model market. In addition, the developed model can be listed on the model marketwith additional information (e.g., task, domain) for federated transfer learning.

– Combined algorithms for communication reduction: The method to combine differ-ent communication reduction techniques (combining model compression technique withlocal updating) is an interesting direction to further improve the system’s communicationefficiency (e.g., optimise the size of model updates and communication instances) [74, 96, 99].However, the feasibility of such combination is still under-explored. In addition, the trade-offs between model performance and communication efficiency are needed to be furtherexamined (e.g., how to manage the tradeoffs under the change of training settings?).

– Asynchronous federated learning: Synchronous federated learning may have efficiencyissues caused by stragglers. Asynchronous federated learning has been considered as a morepractical solution, even allowing clients to participate in the training halfway [74, 96, 99].However, new asynchronous algorithms are still under explored to provide convergenceguarantee.

– Statistical heterogeneity quantification: The quantification of the statistical hetero-geneity into metrics (e.g., local dissimilarity) is needed to help improve the model perfor-mance for non-IID condition [96, 99]. However, the metrics can hardly be calculated beforetraining. One important future direction is the design of efficient algorithms to calculatethe degree of statistical heterogeneity that is useful for model optimisations.

– Multi-task learning: Most of the existing studies focus on training the same model onmultiple clients. One interesting direction is to explore how to apply federated learning totrain different models in the federated networks [74, 109].

– Decentralised learning: As mentioned above, decentralised learning does not requirea central server in the system [74, 109], which prevents single-point-of-failure. Hence, itwould be interesting to explore if there is any new attack or whether federated learningsecurity issues still exist in this setting.

– One/few shot learning: Federated learning that executes less training iterations, such asone/few-shot learning, has been recently discussed for federated learning [74, 96]. However,more theoretical and empirical studies are needed in this direction.

– Federated transfer learning: For federated transfer learning, there are only 2 domains ordata parties are assumed in most of the current literature. Therefore, expanding federatedtransfer learning to multiple domains, or data parties is an open problem [74].

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 29: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:29

5 THREATS TO VALIDITYWe identified the threats to validity that might influence the outcomes of our research. Firstly,publication bias exists as most published papers have positive results over negative results. Whilestudies with positive results are much appealing to be published in comparison with studies withno or negative results, a tendency towards certain outcomes might leads to biased conclusions.The second threat is the exclusion of studies that focus on pure algorithm and gradient opti-

misation. The research on algorithm and gradient optimisations is not within our research scope.Hence, the exclusion of respective studies may affect the completeness of this research as somediscussions on the model performance and heterogeneity issues might be excluded.The third threat is the incomplete or inappropriate search strings in the automatic search. We

tried to include all the possible supplementary terms related to federated learning while excludingkeywords that return pure conventional machine learning publication searches. However, thesearch terms in the search strings may still be inadequate or insufficient to search all the relevantwork related to federated learning research topics.

The fourth threat is the exclusion of ArXiv and Google Scholar papers which are not cited in anyother peer-reviewed publications. As papers from these sources are not peer-reviewed, we cannotguarantee the quality of these research works. However, we intend to include as many studies infederated learning as possible to avoid leaving out important state-of-the-art federated learningresearches. Therefore, we maintain the quality of the search pool by only including papers that arecited by peer-reviewed papers. Thus, we only include papers that are cited by other publications.The fifth threat is the time span of research works included in this SLR. We aim to provide the

most up-to-date review by including all the research papers from 2016.01.01 to 2020.31.01. Sincethe data in 2020 does not represent the research trend of the entire year, we only include worksfrom 2016 to 2019 for the research trend analysis (refer to Fig. 6). However, the studies collected inonly January 2020 is equally significant to those of the previous years to identify challenges andsolutions for federated learning. Hence, we keep the findings from the papers published in 2020 forthe remaining data analyses.The sixth threat is the study selection bias. To avoid the study selection bias between the

2 researchers, the cross-validation of the results from the pilot study is performed by the twoindependent researchers prior to the update of search terms and inclusion/exclusion criteria. Themutual agreement between the two independent researchers on the selections of paper is required.When a dispute on the decision occurs, the third researcher is consulted.

The last threat is the difference in the experiences and knowledge regarding the research fieldof the researchers. The 3 independent researchers may have a significant difference in researchexperiences. Other than research experience, the fundamental understanding and knowledge infederated learning researches and applications of each independent researcher may also vary. Thesewill cause bias in the data collections and analyses conducted and influence the analyses of collectedresearch works.

6 RELATEDWORKSPrior to the conduction of this systematic literature review, we have collected all the relevantsurveys and reviews. There are 7 surveys and 1 review that covered the federated learning topic.After reviewing all the publications, we found out that only [94] has a defined methodology for theirresearch. To our knowledge, there is still no systematic literature review conducted on federatedlearning.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 30: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:30 Lo, et al.

[94] proposed a federated learning building blocks taxonomy that classifies the federated learningsystem into 6 aspects: data partitioning, machine learning model, privacy mechanism, communica-tion architecture, the scale of the federation, and motivation of federation. [74] presented a generalsurvey on the advancement in research trends and the open problems suggested by researchers.Moreover, the paper covered the detailed definitions of federated learning system componentsand different types of federated learning systems variations. Identically, [96] presented a reviewon federated learning that covered some core challenges of federated learning in communicationefficiency, privacy, and future research directions. Surveys on federated learning systems for specificresearch domains are also conducted. For instance, [121] reviewed federated learning in the contextof wireless communications. The survey mainly covered the data security and privacy challenges,algorithm challenges, and wireless setting challenges of federated learning systems. Similarly, [99]performed a survey on federated learning in the mobile edge network that covered the studies andresearches on federated learning and mobile edge computing. [109] surveyed on the security threatsand vulnerability challenges dealing with adversaries in federated learning systems whereas [177]surveyed the healthcare and medical informatics domain. A review of federated learning thatfocuses on data privacy and security aspects was conducted by [185].Ultimately, the survey and review papers have reported different studies and research works

conducted on federated learning. The comparisons of each survey and review papers with oursystematic literature review are summarised in Table 19. We compared our work with the existingworks in 4 aspects, mentioned byHall et al. [53]. Ourwork has fulfilled the 4 aspects that differentiatethis work from the previous works. (1) Time frames: our review on the state-of-the-art is themost contemporary as it is the most up-to-date review. (2) Systematic approach: we followedKitchenham’s standard guideline [82] to conduct this systematic literature review. Most of theexisting works have no clear methodology approach, where information is collected and interpretedwith subjective summaries of findings that may be subject to bias. (3) Comprehensiveness: thenumber of papers analysed in our work is higher than the existing reviews or surveys as we havescreened through relevant journals and conferences paper-by-paper. (4) Analysis: we provided 2more detailed review on federated learning approaches, including (1) the context of studies (e.g.,publication venues, year, type of evaluation metrics, method, and dataset used) for the state-of-the-art research identification and (2) the data synthesis of the findings from software engineeringperspectives, mainly on the requirement analysis, architecture design, and evaluations.

Table 19. Comparison with existing reviews/surveys on federated learning

Paper Type Time frames Methodology Scoping

Our study SLR 2016-2020 Following SLRguideline [82]

A softwareengineering view

Yang et al. (2019) [185] Survey 2016-2018 Undefined In generalKairouz et al. (2019) [74] Survey 2016-2019 Undefined In generalLi et al. (2020) [94] Survey 2016-2019 Customised A system viewLi et al. (2019) [96] Survey 2016-2020 Undefined In generalNiknam et al. (2020) [121] Review 2016-2019 Undefined Wireless communicationsLim et al. (2020) [99] Survey 2016-2020 Undefined Mobile edge networksLyu et al. (2020) [109] Survey 2016-2020 Undefined VulnerabilitiesXu and Wang (2019) [177] Survey 2016-2019 Undefined Healthcare informatics

7 CONCLUSIONFederated learning being an emerging decentralised machine learning paradigm has attracted ahigh amount of attention from academia and industry. This paper presented a systematic literature

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 31: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:31

review on federated learning from the software engineering perspective. We have collected 1074studies through automatic searches on the selected primary sources and have identified 231 studiesfor this in-depth review. The review is conducted based on a predefined protocol. Through thisresearch, we have discovered and presented some insightful findings, open problems, and prospectsof federated learning research. This comprehensive systematic literature review is expected toguide researchers and practitioners to understand the state-of-the-art and conduct future studies infederated learning.

REFERENCES[1] 2008. ISO/IEC 25012. https://iso25000.com/index.php/en/iso-25000-standards/iso-25012[2] 2011. ISO/IEC 25010. https://iso25000.com/index.php/en/iso-25000-standards/iso-25010[3] 2019. General Data Protection Regulation GDPR. https://gdpr-info.eu/[4] Mehdi Salehi Heydar Abad, Emre Ozfatura, Deniz Gunduz, and Ozgur Ercetin. 2019. Hierarchical federated learning

across heterogeneous cellular networks. arXiv preprint arXiv:1909.02362 (2019).[5] J. Ahn, O. Simeone, and J. Kang. 2019. Wireless Federated Distillation for Distributed Edge Learning with Heteroge-

neous Data. In 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications(PIMRC). 1–6.

[6] Mohammad Mohammadi Amiri and Deniz Gunduz. 2019. Federated learning over wireless fading channels. arXivpreprint arXiv:1907.09769 (2019).

[7] Mohammad Mohammadi Amiri, Deniz Gunduz, Sanjeev R. Kulkarni, and H. Vincent Poor. 2020. Update Aware DeviceScheduling for Federated Learning at the Wireless Edge. arXiv:2001.10402 [cs.IT]

[8] Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, and Antonio Ferrara. 2019. Towards Effective Device-AwareFederated Learning. Springer International Publishing, Cham, 477–491.

[9] T. T. Anh, N. C. Luong, D. Niyato, D. I. Kim, and L. Wang. 2019. Efficient Training Management for Mobile Crowd-Machine Learning: A Deep Reinforcement Learning Approach. IEEE Wireless Communications Letters 8, 5 (Oct 2019),1345–1348.

[10] Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, RajivMathews, and Blaise Aguera y Arcas. 2019. Generative Models for Effective ML on Private, Decentralized Datasets.arXiv:1911.06679 [cs.LG]

[11] Sana Awan, Fengjun Li, Bo Luo, and Mei Liu. 2019. Poster: A Reliable and Accountable Privacy-Preserving FederatedLearning Framework using the Blockchain. In Proceedings of the 2019 ACM SIGSAC Conference on Computer andCommunications Security. Association for Computing Machinery, London, United Kingdom, 2561–2563.

[12] U. M. Aïvodji, S. Gambs, and A. Martin. 2019. IOTFLA : A Secured and Privacy-Preserving Smart Home ArchitectureImplementing Federated Learning. In 2019 IEEE Security and Privacy Workshops (SPW). 175–180.

[13] Evita Bakopoulou, Balint Tillman, and Athina Markopoulou. 2019. A federated learning approach for mobile packetclassification. arXiv preprint arXiv:1907.13113 (2019).

[14] X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu. 2019. FLChain: A Blockchain for Auditable Federated Learning withTrust and Incentive. In 2019 5th International Conference on Big Data Computing and Communications (BIGCOM).151–159.

[15] Daniel Benditkis, Aviv Keren, Liron Mor-Yosef, Tomer Avidor, Neta Shoham, and Nadav Tal-Israel. 2019. Distributeddeep neural network training on edge devices. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing.Association for Computing Machinery, Arlington, Virginia, 304–306.

[16] Abhishek Bhowmick, John Duchi, Julien Freudiger, Gaurav Kapoor, and Ryan Rogers. 2018. Protection againstreconstruction and its applications in private federated learning. arXiv preprint arXiv:1812.00984 (2018).

[17] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon,Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, et al. 2019. Towards federated learning at scale: Systemdesign. arXiv preprint arXiv:1902.01046 (2019).

[18] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, DanielRamage, Aaron Segal, and Karn Seth. 2016. Practical secure aggregation for federated learning on user-held data.arXiv preprint arXiv:1611.04482 (2016).

[19] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, DanielRamage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Association for ComputingMachinery, Dallas, Texas, USA, 1175–1191.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 32: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:32 Lo, et al.

[20] Keith Bonawitz, Fariborz Salehi, Jakub Konečny, Brendan McMahan, and Marco Gruteser. 2019. Federated learningwith autotuned communication-efficient secure aggregation. arXiv preprint arXiv:1912.00131 (2019).

[21] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith,and Ameet Talwalkar. 2018. LEAF: A Benchmark for Federated Settings. arXiv:1812.01097 [cs.LG]

[22] Sebastian Caldas, Jakub Konečny, H BrendanMcMahan, and Ameet Talwalkar. 2018. Expanding the Reach of FederatedLearning by Reducing Client Resource Requirements. arXiv preprint arXiv:1812.07210 (2018).

[23] Sebastian Caldas, Virginia Smith, and Ameet Talwalkar. 2018. Federated Kernelized Multi-Task Learning. In SysMLConference 2018.

[24] Di Chai, LeyeWang, Kai Chen, and Qiang Yang. 2019. Secure Federated Matrix Factorization. arXiv:1906.05108 [cs.CR][25] Fei Chen,Mi Luo, ZhenhuaDong, Zhenguo Li, and XiuqiangHe. 2018. FederatedMeta-Learningwith Fast Convergence

and Efficient Communication. arXiv:1802.07876 [cs.LG][26] M. Chen, O. Semiari, W. Saad, X. Liu, and C. Yin. 2020. Federated Echo State Learning for Minimizing Breaks in

Presence in Wireless Virtual Reality Networks. IEEE Transactions on Wireless Communications 19, 1 (Jan 2020),177–191.

[27] Mingzhe Chen, Zhaohui Yang, Walid Saad, Changchuan Yin, H Vincent Poor, and Shuguang Cui. 2019. A joint learningand communications framework for federated learning over wireless networks. arXiv preprint arXiv:1909.07972(2019).

[28] Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, and Mingyi Hong. 2019. Distributed Training withHeterogeneous Data: Bridging Median- and Mean-Based Algorithms. arXiv:1906.01736 [cs.LG]

[29] Yujing Chen, Yue Ning, and Huzefa Rangwala. 2019. Asynchronous Online Federated Learning for Edge Devices.arXiv preprint arXiv:1911.02134 (2019).

[30] Yudong Chen, Lili Su, and Jiaming Xu. 2018. Distributed Statistical Machine Learning in Adversarial Settings:Byzantine Gradient Descent. SIGMETRICS Perform. Eval. Rev. 46, 1 (2018), 96. https://doi.org/10.1145/3292040.3219655

[31] Yang Chen, Xiaoyan Sun, and Yaochu Jin. 2019. Communication-Efficient Federated Deep Learningwith AsynchronousModel Update and Temporally Weighted Aggregation. arXiv:1903.07424 [cs.LG]

[32] Y. Chen, X. Sun, and Y. Jin. 2019. Communication-Efficient Federated Deep Learning With Layerwise AsynchronousModel Update and Temporally Weighted Aggregation. IEEE Transactions on Neural Networks and Learning Systems(2019), 1–10.

[33] Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. SecureBoost: A Lossless FederatedLearning Framework. arXiv preprint arXiv:1901.08755 (2019).

[34] J. Choi and S. R. Pokhrel. 2019. Federated Learning with Multichannel ALOHA. IEEE Wireless Communications Letters(2019), 1–1.

[35] Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and AmarDas. 2019. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:1910.02578 [cs.LG]

[36] Luca Corinzia and Joachim M Buhmann. 2019. Variational Federated Multi-Task Learning. arXiv preprintarXiv:1906.06268 (2019).

[37] Harshit Daga, Patrick K. Nicholson, Ada Gavrilovska, and Diego Lugones. 2019. Cartel: A System for CollaborativeTransfer Learning at the Edge. In Proceedings of the ACM Symposium on Cloud Computing. Association for ComputingMachinery, Santa Cruz, CA, USA, 25–37.

[38] K. Deng, Z. Chen, S. Zhang, C. Gong, and J. Zhu. 2019. Content Compression Coding for Federated Learning. In 201911th International Conference on Wireless Communications and Signal Processing (WCSP). 1–6.

[39] Canh Dinh, Nguyen H Tran, Minh NH Nguyen, Choong Seon Hong, Wei Bao, Albert Zomaya, and Vincent Gramoli.2019. Federated Learning over Wireless Networks: Convergence Analysis and Resource Allocation. arXiv preprintarXiv:1910.13067 (2019).

[40] R. Doku, D. B. Rawat, and C. Liu. 2019. Towards Federated Learning Approach to Determine Data Relevance in BigData. In 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). 184–192.

[41] Wei Du, Xiao Zeng, Ming Yan, and Mi Zhang. 2018. Efficient Federated Learning via Variational Dropout. (2018).[42] Moming Duan. 2019. Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep

learning applications. arXiv preprint arXiv:1907.01132 (2019).[43] S. Duan, D. Zhang, Y. Wang, L. Li, and Y. Zhang. 2019. JointRec: A Deep Learning-based Joint Cloud Video Recom-

mendation Framework for Mobile IoTs. IEEE Internet of Things Journal (2019), 1–1.[44] Amin Fadaeddini, Babak Majidi, and Mohammad Eshghi. 2019. Privacy Preserved Decentralized Deep Learning: A

Blockchain Based Solution for Secure AI-Driven Enterprise. Springer International Publishing, Cham, 32–40.[45] Zipei Fan, Xuan Song, Renhe Jiang, Quanjun Chen, and Ryosuke Shibasaki. 2019. Decentralized Attention-based

Personalized Human Mobility Prediction. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4 (2019), Article133.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 33: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:33

[46] S. Feng, D. Niyato, P. Wang, D. I. Kim, and Y. Liang. 2019. Joint Service Pricing and Cooperative Relay Communicationfor Federated Learning. In 2019 International Conference on Internet of Things (iThings) and IEEE Green Computingand Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data(SmartData). 815–820.

[47] Clement Fung, Jamie Koerner, Stewart Grant, and Ivan Beschastnikh. 2018. Dancing in the Dark: Private Multi-PartyMachine Learning in an Untrusted Setting. arXiv:1811.09712 [cs.CR]

[48] Clement Fung, Chris JM Yoon, and Ivan Beschastnikh. 2018. Mitigating sybils in federated learning poisoning. arXivpreprint arXiv:1808.04866 (2018).

[49] Robin C Geyer, Tassilo Klein, and Moin Nabi. 2017. Differentially private federated learning: A client level perspective.arXiv preprint arXiv:1712.07557 (2017).

[50] Badih Ghazi, Rasmus Pagh, and Ameya Velingker. 2019. Scalable and Differentially Private Distributed Aggregationin the Shuffled Model. arXiv:1906.08320 [cs.LG]

[51] Avishek Ghosh, Justin Hong, Dong Yin, and Kannan Ramchandran. 2019. Robust Federated Learning in a Heteroge-neous Environment. arXiv preprint arXiv:1906.06629 (2019).

[52] Neel Guha, Ameet Talwlkar, and Virginia Smith. 2019. One-Shot Federated Learning. arXiv preprint arXiv:1902.11175(2019).

[53] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A Systematic Literature Review on Fault PredictionPerformance in Software Engineering. IEEE Transactions on Software Engineering 38, 6 (2012), 1276–1304.

[54] Pengchao Han, Shiqiang Wang, and Kin K. Leung. 2020. Adaptive Gradient Sparsification for Efficient FederatedLearning: An Online Learning Approach. arXiv:2001.04756 [cs.LG]

[55] Yufei Han and Xiangliang Zhang. 2019. Robust Federated Training via Collaborative Machine Teaching using TrustedInstances. arXiv preprint arXiv:1905.02941 (2019).

[56] M. Hao, H. Li, X. Luo, G. Xu, H. Yang, and S. Liu. 2019. Efficient and Privacy-enhanced Federated Learning forIndustrial Artificial Intelligence. IEEE Transactions on Industrial Informatics (2019), 1–1.

[57] M. Hao, H. Li, G. Xu, S. Liu, and H. Yang. 2019. Towards Efficient and Privacy-Preserving Federated Deep Learning.In ICC 2019 - 2019 IEEE International Conference on Communications (ICC). 1–6.

[58] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, HubertEichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv preprintarXiv:1811.03604 (2018).

[59] Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne.2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphicencryption. arXiv preprint arXiv:1711.10677 (2017).

[60] Chaoyang He, Conghui Tan, Hanlin Tang, Shuang Qiu, and Ji Liu. 2019. Central server free federated learning oversingle-sided trust social networks. arXiv preprint arXiv:1910.04956 (2019).

[61] X. He, Q. Ling, and T. Chen. 2019. Byzantine-Robust Stochastic Gradient Descent for Distributed Low-Rank MatrixCompletion, In 2019 IEEE Data Science Workshop (DSW). 2019 IEEE Data Science Workshop (DSW), 322–326.

[62] Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. 2019. Measuring the Effects of Non-Identical Data Distributionfor Federated Visual Classification. arXiv:1909.06335 [cs.LG]

[63] B. Hu, Y. Gao, L. Liu, and H. Ma. 2018. Federated Region-Learning: An Edge Computing Based Framework for UrbanEnvironment Sensing. In 2018 IEEE Global Communications Conference (GLOBECOM). 1–7.

[64] Chenghao Hu, Jingyan Jiang, and Zhi Wang. 2019. Decentralized Federated Learning: A Segmented Gossip Approach.arXiv preprint arXiv:1908.07782 (2019).

[65] Yao Hu, Xiaoyan Sun, Yang Chen, and Zishuai Lu. 2019. Model and Feature Aggregation Based Federated Learningfor Multi-sensor Time Series Trend Following. Springer International Publishing, Cham, 233–246.

[66] S. Hua, K. Yang, and Y. Shi. 2019. On-Device Federated Learning via Second-Order Optimization with Over-the-AirComputation. In 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall). 1–5.

[67] Li Huang, Andrew L. Shea, Huining Qian, Aditya Masurkar, Hao Deng, and Dianbo Liu. 2019. Patient clusteringimproves efficiency of federated machine learning to predict mortality and hospital stay time using distributedelectronic medical records. Journal of Biomedical Informatics 99 (2019), 103291. http://www.sciencedirect.com/science/article/pii/S1532046419302102

[68] Amir Jalalirad, Marco Scavuzzo, Catalin Capota, and Michael Sprague. 2019. A Simple and Efficient FederatedRecommender System. In Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applicationsand Technologies. Association for Computing Machinery, Auckland, New Zealand, 53–58.

[69] Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. 2018. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXivpreprint arXiv:1811.11479 (2018).

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 34: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:34 Lo, et al.

[70] Shaoxiong Ji, Guodong Long, Shirui Pan, Tianqing Zhu, Jing Jiang, and Sen Wang. 2019. Detecting Suicidal Ideationwith Data Protection in Online Communities. Springer International Publishing, Cham, 225–229.

[71] Di Jiang, Yuanfeng Song, Yongxin Tong, Xueyang Wu, Weiwei Zhao, Qian Xu, and Qiang Yang. 2019. FederatedTopic Modeling. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management.Association for Computing Machinery, Beijing, China, 1071–1080.

[72] Yihan Jiang, Jakub Konečny, Keith Rush, and Sreeram Kannan. 2019. Improving federated learning personalizationvia model agnostic meta learning. arXiv preprint arXiv:1909.12488 (2019).

[73] Yuang Jiang, Shiqiang Wang, Bong Jun Ko, Wei-Han Lee, and Leandros Tassiulas. 2019. Model Pruning EnablesEfficient Federated Learning on Edge Devices. arXiv preprint arXiv:1909.12326 (2019).

[74] Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, KeithBonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Salim El Rouayheb, DavidEvans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui,Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, MikhailKhodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu,Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage,Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr,Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2019.Advances and Open Problems in Federated Learning. arXiv:1912.04977 [cs.LG]

[75] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang. 2019. Incentive Mechanism for Reliable Federated Learning: A JointOptimization Approach to Combining Reputation and Contract Theory. IEEE Internet of Things Journal 6, 6 (Dec2019), 10700–10714.

[76] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y. Liang, and D. I. Kim. 2019. Incentive Design for Efficient Federated Learning inMobile Networks: A Contract Theory Approach. In 2019 IEEE VTS Asia Pacific Wireless Communications Symposium(APWCS). 1–5.

[77] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani. 2020. Reliable Federated Learning for Mobile Networks.IEEE Wireless Communications 27, 2 (2020), 72–80.

[78] Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J Reddi, Sebastian U Stich, and Ananda TheerthaSuresh. 2019. SCAFFOLD: Stochastic controlled averaging for on-device federated learning. arXiv preprintarXiv:1910.06378 (2019).

[79] Latif U Khan, Nguyen H Tran, Shashi Raj Pandey, Walid Saad, Zhu Han, Minh NH Nguyen, and Choong Seon Hong.2019. Federated Learning for Edge Networks: Resource Optimization and Incentive Mechanism. arXiv preprintarXiv:1911.05642 (2019).

[80] H. Kim, J. Park, M. Bennis, and S. Kim. 2019. Blockchained On-Device Federated Learning. IEEE CommunicationsLetters (2019), 1–1.

[81] Y. J. Kim and C. S. Hong. 2019. Blockchain-based Node-aware Dynamic Weighting Methods for Improving FederatedLearning Performance. In 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS). 1–4.

[82] B. Kitchenham and S Charters. 2007. Guidelines for performing Systematic Literature Reviews in Software Engineer-ing.

[83] Jakub Konečny, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016.Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).

[84] Jakub Konečný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated Optimization: DistributedMachine Learning for On-Device Intelligence. arXiv:1610.02527 [cs.LG]

[85] Antti Koskela and Antti Honkela. 2019. Learning Rate Adaptation for Federated and Differentially Private Learning.stat 1050 (2019), 31.

[86] Anusha Lalitha, Osman Cihan Kilinc, Tara Javidi, and Farinaz Koushanfar. 2019. Peer-to-peer Federated Learning onGraphs. arXiv preprint arXiv:1901.11173 (2019).

[87] Anusha Lalitha, Shubhanshu Shekhar, Tara Javidi, and Farinaz Koushanfar. 2018. Fully decentralized federatedlearning. In Third workshop on Bayesian Deep Learning (NeurIPS).

[88] Anusha Lalitha, Xinghan Wang, Osman Kilinc, Yongxi Lu, Tara Javidi, and Farinaz Koushanfar. 2019. DecentralizedBayesian Learning over Graphs. arXiv:1905.10466 [stat.ML]

[89] D. Leroy, A. Coucke, T. Lavril, T. Gisselbrecht, and J. Dureau. 2019. Federated Learning for Keyword Spotting. InICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6341–6345.

[90] Daliang Li and Junpu Wang. 2019. FedMD: Heterogenous Federated Learning via Model Distillation. arXiv preprintarXiv:1910.03581 (2019).

[91] Hongyu Li and Tianqi Han. 2019. An End-to-End Encrypted Neural Network for Gradient Updates Transmission inFederated Learning. arXiv:1908.08340 [cs.LG]

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 35: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:35

[92] Jeffrey Li, Mikhail Khodak, Sebastian Caldas, and Ameet Talwalkar. 2019. Differentially Private Meta-Learning.arXiv:1909.05830 [cs.LG]

[93] L. Li, H. Xiong, Z. Guo, J. Wang, and C. Xu. 2019. SmartPC: Hierarchical Pace Control in Real-Time FederatedLearning System, In 2019 IEEE Real-Time Systems Symposium (RTSS). 2019 IEEE Real-Time Systems Symposium(RTSS), 406–418.

[94] Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, and Bingsheng He. 2019. A Survey on Federated LearningSystems: Vision, Hype and Reality for Data Privacy and Protection. arXiv:1907.09693 [cs.LG]

[95] Suyi Li, Yong Cheng, Yang Liu, Wei Wang, and Tianjian Chen. 2019. Abnormal client behavior detection in federatedlearning. arXiv preprint arXiv:1910.09933 (2019).

[96] Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated Learning: Challenges, Methods, andFuture Directions. IEEE Signal Processing Magazine 37, 3 (May 2020), 50–60. https://doi.org/10.1109/msp.2020.2975749

[97] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2018. FederatedOptimization in Heterogeneous Networks. arXiv:1812.06127 [cs.LG]

[98] Tian Li, Maziar Sanjabi, and Virginia Smith. 2019. Fair Resource Allocation in Federated Learning. arXiv preprintarXiv:1905.10497 (2019).

[99] Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, DusitNiyato, and Chunyan Miao. 2019. Federated Learning in Mobile Edge Networks: A Comprehensive Survey.arXiv:1909.11875 [cs.NI]

[100] B. Liu, L. Wang, and M. Liu. 2019. Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigationin Cloud Robotic Systems. IEEE Robotics and Automation Letters 4, 4 (Oct 2019), 4555–4562.

[101] Changchang Liu, Supriyo Chakraborty, and Dinesh Verma. 2019. Secure Model Fusion for Distributed Learning UsingPartial Homomorphic Encryption. In Policy-Based Autonomic Data Governance, Seraphin Calo, Elisa Bertino, andDinesh Verma (Eds.). Springer International Publishing, Cham, 154–179. https://doi.org/10.1007/978-3-030-17277-0_9

[102] Lumin Liu, Jun Zhang, S. H. Song, and Khaled Ben Letaief. 2019. client-edge-cloud hierarchical federated learning.CoRR abs/1905.06641 (2019). arXiv:1905.06641 http://arxiv.org/abs/1905.06641

[103] Yang Liu, Tianjian Chen, and Qiang Yang. 2018. Secure Federated Transfer Learning. arXiv:1812.03337 [cs.LG][104] Yang Liu, Yan Kang, Xinwei Zhang, Liping Li, Yong Cheng, Tianjian Chen, Mingyi Hong, and Qiang Yang. 2019. A

Communication Efficient Vertical Federated Learning Framework. arXiv preprint arXiv:1912.11187 (2019).[105] Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, Surya Nepal, and Robert Deng. 2019. Boosting Privately: Privacy-Preserving

Federated Extreme Boosting for Mobile Crowdsensing. arXiv:1907.10218 [cs.CR][106] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang. 2019. Blockchain and Federated Learning for Privacy-preserved

Data Sharing in Industrial IoT. IEEE Transactions on Industrial Informatics (2019), 1–1.[107] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang. 2019. Differentially Private Asynchronous Federated Learning for

Mobile Edge Computing in Urban Informatics. IEEE Transactions on Industrial Informatics (2019), 1–1.[108] S. Lugan, P. Desbordes, E. Brion, L. X. Ramos Tormo, A. Legay, and B. Macq. 2019. Secure Architectures Implementing

Trusted Coalitions for Blockchained Distributed Learning (TCLearn). IEEE Access 7 (2019), 181789–181799.[109] Lingjuan Lyu, Han Yu, and Qiang Yang. 2020. Threats to Federated Learning: A Survey. arXiv:2003.02133 [cs.CR][110] Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C. Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-Preserving Tensor

Factorization for Collaborative Health Data Analysis. In Proceedings of the 28th ACM International Conference onInformation and Knowledge Management. Association for Computing Machinery, Beijing, China, 1291–1300.

[111] U. Majeed and C. S. Hong. 2019. FLchain: Federated Learning via MEC-enabled Blockchain Network. In 2019 20thAsia-Pacific Network Operations and Management Symposium (APNOMS). 1–4.

[112] Kalikinkar Mandal and Guang Gong. 2019. PrivFL: Practical Privacy-preserving Federated Regressions on High-dimensional Data over Mobile Networks. In Proceedings of the 2019 ACM SIGSAC Conference on Cloud ComputingSecurity Workshop. Association for Computing Machinery, London, United Kingdom, 57–68.

[113] I. Martinez, S. Francis, and A. S. Hafid. 2019. Record and Reward Federated Learning Contributions with Blockchain.In 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). 50–57.

[114] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2016. Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv:1602.05629 [cs.LG]

[115] J. Mills, J. Hu, and G. Min. 2019. Communication-Efficient Federated Learning for Wireless Edge Intelligence in IoT.IEEE Internet of Things Journal (2019), 1–1.

[116] Fan Mo and Hamed Haddadi. [n.d.]. Efficient and Private Federated Learning using TEE. ([n. d.]).[117] Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. 2019. Agnostic federated learning. arXiv preprint

arXiv:1902.00146 (2019).[118] N. I. Mowla, N. H. Tran, I. Doh, and K. Chae. 2020. Federated Learning-Based Cognitive Detection of Jamming Attack

in Flying Ad-Hoc Network. IEEE Access 8 (2020), 4338–4350.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 36: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:36 Lo, et al.

[119] C. Nadiger, A. Kumar, and S. Abdelhak. 2019. Federated Reinforcement Learning for Fast Personalization. In 2019IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE). 123–127.

[120] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan, and A. Sadeghi. 2019. DÏoT: A Federated Self-learning Anomaly Detection System for IoT. In 2019 IEEE 39th International Conference on Distributed ComputingSystems (ICDCS). 756–767.

[121] Solmaz Niknam, Harpreet S. Dhillon, and Jeffery H. Reed. 2019. Federated Learning for Wireless Communications:Motivation, Opportunities and Challenges. arXiv:1908.06847 [eess.SP]

[122] T. Nishio and R. Yonetani. 2019. Client Selection for Federated Learning with Heterogeneous Resources in MobileEdge. In ICC 2019 - 2019 IEEE International Conference on Communications (ICC). 1–7.

[123] Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. 2019. Securefederated submodel learning. arXiv preprint arXiv:1911.02254 (2019).

[124] Richard Nock, Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Giorgio Patrini, Guillaume Smith, and Brian Thorne.2018. Entity Resolution and Federated Learning get a Federated Resolution. arXiv preprint arXiv:1803.04035 (2018).

[125] Tribhuvanesh Orekondy, Seong Joon Oh, Yang Zhang, Bernt Schiele, and Mario Fritz. 2018. Gradient-Leaks: Under-standing and Controlling Deanonymization in Federated Learning. arXiv:1805.05838 [cs.CR]

[126] S. R. Pandey, N. H. Tran, M. Bennis, Y. K. Tun, Z. Han, and C. S. Hong. 2019. Incentivize to Build: A CrowdsourcingFramework for Federated Learning, In 2019 IEEE Global Communications Conference (GLOBECOM). 2019 IEEEGlobal Communications Conference (GLOBECOM), 1–6.

[127] S. R. Pandey, N. H. Tran, M. Bennis, Y. K. Tun, A. Manzoor, and C. S. Hong. 2020. A Crowdsourcing Framework forOn-Device Federated Learning. IEEE Transactions on Wireless Communications 19, 5 (2020), 3241–3256.

[128] Xingchao Peng, Zijun Huang, Yizhe Zhu, and Kate Saenko. 2019. Federated Adversarial Domain Adaptation.arXiv:1911.02054 [cs.CV]

[129] Daniel Peterson, Pallika Kanani, and Virendra J Marathe. 2019. Private Federated Learning with Domain Adaptation.arXiv preprint arXiv:1912.06733 (2019).

[130] Krishna Pillutla, Sham M Kakade, and Zaid Harchaoui. 2019. Robust aggregation for federated learning. arXiv preprintarXiv:1912.13445 (2019).

[131] Davy Preuveneers, Vera Rimmer, Ilias Tsingenopoulos, Jan Spooren, Wouter Joosen, and Elisabeth Ilie-Zudor. 2018.Chained anomaly detection models for federated learning: an intrusion detection case study. Applied Sciences 8, 12(2018), 2663.

[132] J. Qian, S. P. Gochhayat, and L. K. Hansen. 2019. Distributed Active Learning Strategies on Edge Computing. In2019 6th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/ 2019 5th IEEE InternationalConference on Edge Computing and Scalable Cloud (EdgeCom). 221–226.

[133] Jia Qian, Sayantan Sengupta, and Lars Kai Hansen. 2019. Active Learning Solution on Distributed Edge Computing.arXiv:1906.10718 [cs.DC]

[134] Yongfeng Qian, Long Hu, Jing Chen, Xin Guan, Mohammad Mehedi Hassan, and Abdulhameed Alelaiwi. 2019.Privacy-aware service placement for mobile edge computing via federated learning. Information Sciences 505 (2019),562 – 570. http://www.sciencedirect.com/science/article/pii/S0020025519306814

[135] Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. 2019. Federated learning for emojiprediction in a mobile keyboard. arXiv preprint arXiv:1906.04329 (2019).

[136] Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, and Ramtin Pedarsani. 2019. Fedpaq:A communication-efficient federated learning method with periodic averaging and quantization. arXiv preprintarXiv:1909.13014 (2019).

[137] J. Ren, H. Wang, T. Hou, S. Zheng, and C. Tang. 2019. Federated Learning-Based Computation Offloading Optimizationin Edge Computing-Supported Internet of Things. IEEE Access 7 (2019), 69194–69201.

[138] Jinke Ren, Guanding Yu, and Guangyao Ding. 2019. Accelerating DNN Training in Wireless Federated Edge LearningSystem. arXiv:1905.09712 [cs.LG]

[139] Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, and Christian Wachinger. 2019. BrainTorrent: APeer-to-Peer Environment for Decentralized Federated Learning. arXiv preprint arXiv:1905.06731 (2019).

[140] Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. 2018. A generic framework for privacy preserving deep learning. arXiv:1811.04017 [cs.LG]

[141] Y. Sarikaya and O. Ercetin. 2019. Motivating Workers in Federated Learning: A Stackelberg Game Perspective. IEEENetworking Letters (2019), 1–1.

[142] Felix Sattler, Klaus-Robert Müller, and Wojciech Samek. 2019. Clustered Federated Learning: Model-AgnosticDistributed Multi-Task Optimization under Privacy Constraints. arXiv preprint arXiv:1910.01991 (2019).

[143] F. Sattler, S. Wiedemann, K. Müller, and W. Samek. 2019. Robust and Communication-Efficient Federated LearningFrom Non-i.i.d. Data. IEEE Transactions on Neural Networks and Learning Systems (2019), 1–14.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 37: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:37

[144] S. Savazzi, M. Nicoli, and V. Rampa. 2020. Federated Learning with Cooperating Devices: A Consensus Approach forMassive IoT Networks. IEEE Internet of Things Journal (2020), 1–1.

[145] Muhammad Shayan, Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. 2018. Biscotti: A Ledger for Private andSecure Peer-to-Peer Machine Learning. arXiv:1811.09904 [cs.LG]

[146] Micah J. Sheller, G. Anthony Reina, Brandon Edwards, Jason Martin, and Spyridon Bakas. 2019. Multi-institutionalDeep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation. SpringerInternational Publishing, Cham, 92–104.

[147] Shihao Shen, Yiwen Han, Xiaofei Wang, and Yan Wang. 2019. Computation Offloading with Multiple Agents inEdge-Computing–Supported IoT. ACM Trans. Sen. Netw. 16, 1 (2019), Article 8.

[148] Wenqi Shi, Sheng Zhou, and Zhisheng Niu. 2019. Device Scheduling with Fast Convergence for Wireless FederatedLearning. arXiv preprint arXiv:1911.00856 (2019).

[149] S. Silva, B. A. Gutman, E. Romero, P. M. Thompson, A. Altmann, and M. Lorenzi. 2019. Federated Learning inDistributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data. In 2019 IEEE 16th InternationalSymposium on Biomedical Imaging (ISBI 2019). 270–274.

[150] Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet Talwalkar. 2017. Federated multi-task learning. InProceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc., LongBeach, California, USA, 4427–4437.

[151] Chunhe Song, Tong Li, Xu Huang, Zhongfeng Wang, and Peng Zeng. 2019. Towards Edge Computing BasedDistributed Data Analytics Framework in Smart Grids. Springer International Publishing, Cham, 283–292.

[152] K. Sozinov, V. Vlassov, and S. Girdzijauskas. 2018. Human Activity Recognition Using Federated Learn-ing. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Com-munications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom). 1103–1111.

[153] Xudong Sun, Andrea Bommert, Florian Pfisterer, Jörg Rähenfürher, Michel Lang, and Bernd Bischl. 2020. High Dimen-sional Restrictive Federated Model Selection with Multi-objective Bayesian Optimization over Shifted Distributions.In Intelligent Systems and Applications, Yaxin Bi, Rahul Bhatia, and Supriya Kapoor (Eds.). Springer InternationalPublishing, Cham, 629–647.

[154] Yuxuan Sun, Sheng Zhou, and Deniz Gündüz. 2019. Energy-Aware Analog Aggregation for Federated Learning withRedundant Data. arXiv preprint arXiv:1911.00188 (2019).

[155] Manoj A Thomas, Diya Suzanne Abraham, and Dapeng Liu. 2018. Federated Machine Learning for TranslationalResearch. AMCIS2018 (2018).

[156] Aleksei Triastcyn and Boi Faltings. 2019. Federated Generative Privacy. arXiv:1910.08385 [stat.ML][157] Aleksei Triastcyn and Boi Faltings. 2019. Federated Learning with Bayesian Differential Privacy. arXiv preprint

arXiv:1911.10071 (2019).[158] M. Troglia, J. Melcher, Y. Zheng, D. Anthony, A. Yang, and T. Yang. 2019. FaIR: Federated Incumbent Detection in

CBRS Band. In 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN). 1–6.[159] Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou. 2019. A

Hybrid Approach to Privacy-Preserving Federated Learning. In Proceedings of the 12th ACM Workshop on ArtificialIntelligence and Security. Association for Computing Machinery, London, United Kingdom, 1–11.

[160] Gregor Ulm, Emil Gustavsson, and Mats Jirstrand. 2019. Functional Federated Learning in Erlang (ffl-erl). SpringerInternational Publishing, Cham, 162–178.

[161] D. Verma, G. White, and G. de Mel. 2019. Federated AI for the Enterprise: A Web Services Based Implementation. In2019 IEEE International Conference on Web Services (ICWS). 20–27.

[162] Dinesh C Verma, Graham White, Simon Julier, Stepehen Pasteris, Supriyo Chakraborty, and Greg Cirincione. 2019.Approaches to address the data skew problem in federated learning. In Artificial Intelligence and Machine Learning forMulti-Domain Operations Applications, Vol. 11006. International Society for Optics and Photonics, 110061I.

[163] Tung T Vu, Duy T Ngo, Nguyen H Tran, Hien Quoc Ngo, Minh N Dao, and Richard H Middleton. 2019. Cell-FreeMassive MIMO for Wireless Federated Learning. arXiv preprint arXiv:1909.12567 (2019).

[164] Z. Wan, X. Xia, D. Lo, and G. C. Murphy. 2019. How does Machine Learning Change Software Development Practices?IEEE Transactions on Software Engineering (2019), 1–1.

[165] Guan Wang. 2019. Interpret Federated Learning with Shapley Values. arXiv preprint arXiv:1905.04519 (2019).[166] GuanWang, Charlie Xiaoqian Dang, and Ziye Zhou. 2019. Measure Contribution of Participants in Federated Learning.

arXiv preprint arXiv:1909.08525 (2019).[167] Kangkang Wang, Rajiv Mathews, Chloé Kiddon, Hubert Eichner, Françoise Beaufays, and Daniel Ramage. 2019.

Federated Evaluation of On-device Personalization. arXiv:1910.10252 [cs.LG][168] L. WANG, W. WANG, and B. LI. 2019. CMFL: Mitigating Communication Overhead for Federated Learning. In 2019

IEEE 39th International Conference on Distributed Computing Systems (ICDCS). 954–964.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 38: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

111:38 Lo, et al.

[169] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan. 2019. Adaptive Federated Learning inResource Constrained Edge Computing Systems. IEEE Journal on Selected Areas in Communications 37, 6 (June 2019),1205–1221.

[170] KangWei, Jun Li, MingDing, ChuanMa, HowardH. Yang, Farokhi Farhad, Shi Jin, TonyQ. S. Quek, andH. Vincent Poor.2019. Federated Learning with Differential Privacy: Algorithms and Performance Analysis. arXiv:1911.00222 [cs.LG]

[171] Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, and Qiang Yang. 2019. Multi-agent visualization forexplaining federated learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAIPress, 6572–6574.

[172] J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, and W. Luo. 2019. DeepChain: Auditable and Privacy-Preserving DeepLearning with Blockchain-based Incentive. IEEE Transactions on Dependable and Secure Computing (2019), 1–1.

[173] Roel Wieringa, Neil Maiden, Nancy Mead, and Colette Rolland. 2005. Requirements Engineering Paper Classificationand Evaluation Criteria: A Proposal and a Discussion. Requir. Eng. 11, 1 (Dec. 2005), 102–107. https://doi.org/10.1007/s00766-005-0021-6

[174] Wentai Wu, Ligang He, Weiwei Lin, Stephen Jarvis, et al. 2019. SAFA: a Semi-Asynchronous Protocol for FastFederated Learning with Low Overhead. arXiv preprint arXiv:1910.01355 (2019).

[175] Cong Xie, Sanmi Koyejo, and Indranil Gupta. 2019. Asynchronous Federated Optimization. arXiv:1903.03934 [cs.DC][176] G. Xu, H. Li, S. Liu, K. Yang, and X. Lin. 2020. VerifyNet: Secure and Verifiable Federated Learning. IEEE Transactions

on Information Forensics and Security 15 (2020), 911–926.[177] Jie Xu and Fei Wang. 2019. Federated Learning for Healthcare Informatics. arXiv:1911.06270 [cs.LG][178] Mengwei Xu, Feng Qian, Qiaozhu Mei, Kang Huang, and Xuanzhe Liu. 2018. DeepType: On-Device Deep Learning for

Input Personalization Service with Minimal Privacy Concern. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.2, 4 (2018), Article 197.

[179] Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar, and Heiko Ludwig. 2019. HybridAlpha: An Efficient Approachfor Privacy-Preserving Federated Learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence andSecurity. Association for Computing Machinery, London, United Kingdom, 13–23.

[180] Zichen Xu, Li Li, and Wenting Zou. 2019. Exploring federated learning on battery-powered devices. In Proceedings ofthe ACM Turing Celebration Conference - China. Association for Computing Machinery, Chengdu, China, Article 6.

[181] Howard H Yang, Ahmed Arafa, Tony QS Quek, and H Vincent Poor. 2019. Age-Based Scheduling Policy for FederatedLearning in Mobile Edge Networks. arXiv preprint arXiv:1910.14648 (2019).

[182] H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor. 2020. Scheduling Policies for Federated Learning in WirelessNetworks. IEEE Transactions on Communications 68, 1 (Jan 2020), 317–333.

[183] Kai Yang, Tao Fan, Tianjian Chen, Yuanming Shi, and Qiang Yang. 2019. A Quasi-Newton Method Based VerticalFederated Learning Framework for Logistic Regression. arXiv preprint arXiv:1912.00513 (2019).

[184] K. Yang, T. Jiang, Y. Shi, and Z. Ding. 2020. Federated Learning via Over-the-Air Computation. IEEE Transactions onWireless Communications (2020), 1–1.

[185] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. FederatedMachine Learning: Concept and Applications.ACM Trans. Intell. Syst. Technol. 10, 2, Article 12 (Jan. 2019), 19 pages. https://doi.org/10.1145/3298981

[186] Shengwen Yang, Bing Ren, Xuhui Zhou, and Liping Liu. 2019. Parallel Distributed Logistic Regression for VerticalFederated Learning without Third-Party Coordinator. arXiv preprint arXiv:1911.09824 (2019).

[187] Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and FrançoiseBeaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. arXiv preprintarXiv:1812.02903 (2018).

[188] Wensi Yang, Yuhang Zhang, Kejiang Ye, Li Li, and Cheng-Zhong Xu. 2019. FFD: A Federated Learning Based Methodfor Credit Card Fraud Detection. Springer International Publishing, Cham, 18–32.

[189] Zhaohui Yang, Mingzhe Chen, Walid Saad, Choong Seon Hong, and Mohammad Shikh-Bahaei. 2019. Energy EfficientFederated Learning Over Wireless Communication Networks. arXiv preprint arXiv:1911.02417 (2019).

[190] X. Yao, C. Huang, and L. Sun. 2018. Two-Stream Federated Learning: Reduce the Communication Costs. In 2018 IEEEVisual Communications and Image Processing (VCIP). 1–4.

[191] X. Yao, T. Huang, C. Wu, R. Zhang, and L. Sun. 2019. Towards Faster and Better Federated Learning: A Feature FusionApproach. In 2019 IEEE International Conference on Image Processing (ICIP). 175–179.

[192] D. Ye, R. Yu, M. Pan, and Z. Han. 2020. Federated Learning in Vehicular Edge Computing: A Selective ModelAggregation Approach. IEEE Access (2020), 1–1.

[193] B. Yin, H. Yin, Y. Wu, and Z. Jiang. 2020. FDC: A Secure Federated Deep Learning Mechanism for Data Collaborationsin the Internet of Things. IEEE Internet of Things Journal (2020), 1–1.

[194] Z. Yu, J. Hu, G. Min, H. Lu, Z. Zhao, H. Wang, and N. Georgalas. 2018. Federated Learning Based Proactive ContentCaching in Edge Computing. In 2018 IEEE Global Communications Conference (GLOBECOM). 1–6.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.

Page 39: SIN KIT LO, QINGHUA LU, CHEN WANG, arXiv:2007.11354v6 …HYE-YOUNG PAIK, University of New South Wales, Australia LIMING ZHU, Data61, CSIRO and University of New South Wales, Australia

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective 111:39

[195] Song Guo Yufeng Zhan, Peng Li. 2020. Experience-Driven Computational Resource Allocation of Federated Learningby Deep Reinforcement Learning. IEEE IPDPS (2020).

[196] Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Trong Nghia Hoang, and YasamanKhazaeni. 2019. Bayesian nonparametric federated learning of neural networks. arXiv preprint arXiv:1905.12022(2019).

[197] Qunsong Zeng, Yuqing Du, Kin K. Leung, and Kaibin Huang. 2019. Energy-Efficient Radio Resource Allocation forFederated Edge Learning. CoRR abs/1907.06040 (2019). arXiv:1907.06040 http://arxiv.org/abs/1907.06040

[198] Y. Zhan, P. Li, Z. Qu, D. Zeng, and S. Guo. 2020. A Learning-based Incentive Mechanism for Federated Learning. IEEEInternet of Things Journal (2020), 1–1.

[199] Jiale Zhang, Junyu Wang, Yanchao Zhao, and Bing Chen. 2019. An Efficient Federated Learning Scheme withDifferential Privacy in Mobile Edge Computing. Springer International Publishing, Cham, 538–550.

[200] X. Zhang, X. Chen, J. Liu, and Y. Xiang. 2019. DeepPAR and DeepDPA: Privacy-Preserving and Asynchronous DeepLearning for Industrial IoT. IEEE Transactions on Industrial Informatics (2019), 1–1.

[201] X. Zhang, M. Peng, S. Yan, and Y. Sun. 2019. Deep Reinforcement Learning Based Mode Selection and ResourceAllocation for Cellular V2X Communications. IEEE Internet of Things Journal (2019), 1–1.

[202] Ying Zhao, Junjun Chen, Di Wu, Jian Teng, and Shui Yu. 2019. Multi-Task Network Anomaly Detection usingFederated Learning. In Proceedings of the Tenth International Symposium on Information and Communication Technology.Association for Computing Machinery, Hanoi, Ha Long Bay, Viet Nam, 273–279.

[203] Ying Zhao, Junjun Chen, Jiale Zhang, Di Wu, Jian Teng, and Shui Yu. 2020. PDGAN: A Novel Poisoning DefenseMethod in Federated Learning Using Generative Adversarial Network. In Algorithms and Architectures for ParallelProcessing, Sheng Wen, Albert Zomaya, and Laurence T. Yang (Eds.). Springer International Publishing, Cham,595–609.

[204] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. Federated learning withnon-iid data. arXiv preprint arXiv:1806.00582 (2018).

[205] Yang Zhao, Jun Zhao, Linshan Jiang, Rui Tan, and Dusit Niyato. 2019. Mobile Edge Computing, Blockchain andReputation-based Crowdsourcing IoT Federated Learning: A Secure, Decentralized and Privacy-preserving System.arXiv preprint arXiv:1906.10893 (2019).

[206] P. Zhou, K. Wang, L. Guo, S. Gong, and B. Zheng. 2019. A Privacy-Preserving Distributed Contextual FederatedOnline Learning Framework with Big Data Support in Social Recommender Systems. IEEE Transactions on Knowledgeand Data Engineering (2019), 1–1.

[207] W. Zhou, Y. Li, S. Chen, and B. Ding. 2018. Real-Time Data Processing Architecture for Multi-Robots Based onDifferential Federated Learning. In 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced TrustedComputing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart CityInnovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). 462–471.

[208] G. Zhu, Y. Wang, and K. Huang. 2020. Broadband Analog Aggregation for Low-Latency Federated Edge Learning.IEEE Transactions on Wireless Communications 19, 1 (Jan 2020), 491–506.

[209] H. Zhu and Y. Jin. 2019. Multi-Objective Evolutionary Federated Learning. IEEE Transactions on Neural Networks andLearning Systems (2019), 1–13.

[210] Xudong Zhu, Hui Li, and Yang Yu. 2019. Blockchain-Based Privacy Preserving Deep Learning. Springer InternationalPublishing, Cham, 370–383.

[211] Y. Zou, S. Feng, D. Niyato, Y. Jiao, S. Gong, and W. Cheng. 2019. Mobile Device Training Strategies in FederatedLearning: An Evolutionary Game Approach. In 2019 International Conference on Internet of Things (iThings) and IEEEGreen Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) andIEEE Smart Data (SmartData). 874–879.

A APPENDIXA.1 Data extraction sheet of selected primary studieshttps://drive.google.com/file/d/10yYG8W1FW0qVQOru_kMyS86owuKnZPnz/view?usp=sharing

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2020.