MS-FL A Federated Learning Framework Based On Mult

This article has been accepted for publication in IEEE Access.
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3353131
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000
MS-FL: A Federated Learning Framework Based

on Multiple Security Strategies
WENSHAO YANG1 , PENGFEI KANG1 , and CHAO WEI1 (Member, IEEE)
1
Yanshan University, Qinhuangdao, China (e-mail: wenshao0407@outlook.com;weichao@ysu.edu.cn)
Corresponding author: Chao Wei (e-mail: weichao@ysu.edu.cn).
ABSTRACT With the establishment and standardization of the data trading market, an increasing number of
users are utilizing multi-party data for federated machine learning to obtain their desired models. Therefore,
scholars have proposed numerous federated learning frameworks to address practical issues. However,
there are still three issues that need to be addressed in current federated learning frameworks: 1) privacy
protection, 2) poisoning attack, and 3) protection of the interests of participants. To address these issues,
this paper proposes a novel federated learning framework MS-FL based on multiple security strategies. The
framework’s algorithms guarantee that data providers need not worry about data privacy leakage. At the
same time, it can defend against poisoning attack from malicious nodes. Finally, to ensure the interests of
all parties are protected, a blockchain protocol is utilized. The theoretical derivation proves the effectiveness
of this framework. Experimental results show that the algorithm designed in this paper outperforms other
algorithms.
INDEX TERMS Federated learning, Privacy protection, Poisoning attack, Multiple security strategies, Data
transaction.
I. INTRODUCTION intends to inject poisoned (biased or misleading) data into a

federated learning system, with the objective of modifying the
R ECENT years, due to the presence of sensitive infor-
mation in the data, enterprises and organizations are
regulated by laws and cannot disclose the data. As a result,
model’s parameters in a way that is beneficial to the attacker.
If the aggregation server does not have defense mechanism
most of the data are isolated and closed, leading to the for- and directly uses average of aggregate gradients, it has been
mation of numerous data islands and a significant waste of demonstrated that a single poisoner can control entire training
resources. Therefore, the reasonable use of data resources process [4] [5].
while simultaneously protecting data privacy has become an In this paper, we mainly focus on three types of repre-
urgent issue in the current era. Some scholars have proposed sentative poisoning attacks: (1) Label-flipping attack [6](data
the federated learning (FL) [1] method to complete machine poisoning). In a data set, each data sample carries a category
learning by utilizing the data of multiple parties involved in label. However, malicious nodes can use wrong labels to
machine learning like Fig.1. However, Fredrikson [2] found train machine learning model. (2) Backdoor attack [7](data
that part of data information of model trainers could be in- poisoning). Malicious nodes aim to seek a set of parameters
ferred according to the model after training, which means that to establish strong links between trigger and target label while
even if no data privacy was leaked in the process of federated minimizing the impact on classification of benign inputs.
learning, there is a risk of data privacy leakage from the model In Fig. 2, the trigger is a white block. (3) Arbitrary model
after training. attack [8](model poisoning). If training model locally, mali-
In practice, a fully homomorphic cryptosystem CKKS [3] cious nodes can deliver arbitrary wrong gradients to aggrega-
is widely used to protect data privacy because it supports: (1) tion server to reduce final model accuracy.
addition and multiplication homomorphism of ciphertext; (2) In light of the increasing value of data, data trading [9]
floating-point arithmetic and evaluating an arbitrary polyno- has experienced substantial growth. Within this context, we
mial, which is ideal for privacy-protecting machine learning. examine a specific application scenario for data transactions
Therefore, in this study, we exploit CKKS technology to where the data owner and the model requestor are geograph-
protect data privacy of FL participants. Except for privacy ically separated such that the latter possesses no data of its
leak, in the process of gradient aggregation, the adversary own, and the machine learning model must be trained using
VOLUME 11, 2023 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
Wenshao Yang et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
data from multiple data owners. In this paper, we construct rule as the krum, then averages the parameters closest to the
a novel FL framework MS-FL based on multiple security median of the gradient vectors as the global update. The other
strategies to complete classification tasks. MS-FL achieves one is to assign different weights to gradients based on some
the following advantages: characteristics. Gonzalez et al. [15] firstly calculated cosine
• Privacy. As prior works [10] [11] have shown, an ad- similarity between gradients as weight of model update set
versary may recover data owner’s sensitive information in each iteration. If the cosine similarity exceeds a certain
such as training samples or memberships by inferring range, it is judged as malicious party. In addition, PEFL [16]
shared gradients. To protect data owners’ data privacy, provided a method to defend against label-flipping attack and
we use multiple security protocols and CKKS techenol- backdoor attack with low computational complexity. They
ogy to keep each data owner’s local gradients confi- assigned different weights to the gradients of the participants
dential. In addition, after executing protocols of MS-FL, based on the Pearson correlation coefficient between the gra-
only model requestor has access to the final model. dients and medians of gradient components.
• Robustness. The model training process can defend In order to preserve the privacy of data owners, our frame-
against above three poisoning attacks from malicious work incorporates the utilization of a trusted third party (ser-
nodes. vice provider) who adds noise to the model gradients before
• Successful trade. After every round of model training, transmitting them to the model requestor. To obtain usable
model parameters will be delivered to model requestor gradients from this distorted data, we propose a novel aggre-
and model requestor must pay to data owners. Smart gation algorithm specifically tailored for the model requestor.
contract used in MS-FL on blockchain makes this trans-
action atomic and transparent. What’s more, MS-FL B. PRIVACY-PRESERVING MACHINE LEARNING
guarantees that data owner will not lose all opportunities Recently, privacy-preserving ML mainly bases on the fol-
to benefit from model requestor just because a portion lowing three underlying technologies: Differential Privacy
of submitted gradient components deviates from the (DP) [17], Secure Multi-Party Computation (SMC) [18], and
majority. Homomorphic Encryption (HE) [19]. Many scholars combine
The rest of paper is organized as follows. Section II is homomorphic encryption and machine learning to achieve
related work. In Section III, we overview the preliminaries of privacy protection, which is used in our frame.
this paper. Section IV introduces the specific steps and cor- In literature [20], Han et al. presented an efficient algorithm
responding algorithms to complete federated learning in the for logistic regression on homomorphic encrypted data. Chen
application scenario of this paper. Section V and section VI et al. [21] combined HE and secret sharing to build secure
demonstrate security analysis of the system and convergence largescale sparse logistic regression model and achieved both
property of proposed aggregation algorithm in this frame- efficiency and security. In literatures [22] and [23], scholars
work . In section VII, we compare the proposed aggregation also used HE to train logistic regression models. Nonetheless,
algorithm with exiting algorithms in some aspects and give the aforementioned works solely address the protection of
corresponding comparison graphs. Finally, section VIII sum- participants’ data privacy, without accounting for the model
marizes this paper. privacy of the initiator of the model training.
II. RELATED WORK C. PROTECTION OF THE INTERESTS OF PARTICIPANTS

A. DEFENSE AGAINST POISONING ATTACKS In order to motivate participants in FL, many scholars have
The researchers have developed a lot of defenses for poison- designed FL mechanisms with rewards. However, how to
ing to federal learning. Due to relevance to this paper, we distribute the rewards fairly to protect the interests of the
present statistics-based defenses here, whose key idea is to participants has become a new difficult problem.
focus the aggregation rules on the benign gradient vectors Li et al. [24] proposed an incentive method of contribution
as much as possible. These methods fall roughly into two and benefit sharing, and Zhao et al. [25] proposed an incentive
categories, one is to extract as good gradient data as possible, mechanism based on reputation. Both methods use trusted
such as krum [4], geometric median [12], trimmed mean [13] managers to calculate the contributions of each FL participant
and bulyan [14]. Blanchard [4] designed krum, which selects and distribute the benefits. However, the methods devised in
one of the gradients most similar to the other gradients as the literatures [26] and [27] do not require the involvement of
global update. The similarity here is measured by calculating trusted managers. Zhang et al. [26] designed a smart contract-
the sum of the Euclidean distance of m − f − 2 gradients based incentive mechanism according to the size and centroid
closest to the gradient vector. Geometric median [12] takes distance of customer data used in local model training. The
the median of the gradients as global update. Trimmed mean model in literature [27] requires that the reward of each
removes some extreme values, and then averages the remain- worker is determined by the voting results of the next round
ing parameters as the global update. Bulyan [14] can be regard of workers. Nevertheless, the scenario of separation of model
as a variant that combines krum and trimmed mean. Bulyan requestors and data owners is not considered in the above
firstly chooses less than m − 2f gradients with the same researches.
2 VOLUME 11, 2023
Pengfei Kang et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
III. PRELIMINARIES
A. FEDERATED LEARNING
Unlike traditional machine learning that centralizes all train-
ing data, FL (Federated learning) is a promising distributed
setting that complete machine learing while allowing all data
owners to keep data local like Fig. 1. In FL, the server
orchestrates whole lifecycle of training until model accuracy
reaches desired level, or the number of iterations reaches the
preset value. The goal of learning is to find optimal model
parameters so that the output of model is infinitely close to FIGURE 2. Backdoor attack.
the true label.
In this paper, we focus on horizontal federated learn-
ing [28], which means the data from data owners have dif- B. LOGISTIC REGRESSION
ferent IDs and same features. For example, suppose we have The proposed framework in this paper can accommodate
a central server and n clients {C1 , C2 , ..., Cn } like Fig. 1, each various machine learning models, such as logistic regression,
client has a local dataset Dj , j = 1, 2, ..., n. We use D = linear regression, neural networks, etc. Here, we demonstrate
{D1 , D2 , ..., Dn } to denote the joint dataset. The following is the operational process of this framework using the logistic
the objective function with optimal parameters G. regression model as an example.
Logistic regression is a classification model which is
F(x, G, y) = min E(x,y)∼De L(x, G, y), (1) often used to solve classification problems. In a data set
G
N
{(xi , yi )}i=1 , xi is a d-dimensional feature vector, yi ∈ {0, 1}.
where x is the training data, y is the label, L(x, w, y) is the σ is Sigmoid function. The target is to find a model G ∈ Rd+1 ,
empirical loss function and D̃ is the distribution of the clients’ which satisfies yi = σ (Gxi ) (i = 1, . . . , N ).
data. After local model training, the client Cj sends local ey represents the output of sigmoid function, the cost func-
model Gji in the i-th iteration to aggregation server. When tion L is:
there are no malicious clients, The server can directly take N
the average value of clients’ models as the global model like 1 X
L (G) = − (yi lneyi + (1 − yi ) ln(1 − eyi ) . (3)
following formula, N i=1
n We use stochastic gradient descent (SGD) to update model

1X j
Gi = G. (2) parameters (α is learning rate):
n j=1 i
N
∂L αX
The next step is that aggregation server sends the global Gj = Gj − α = Gj − (ey − yi ) xij . (4)
∂Gj N i=1
model Gi to all clients and finally each client updates local
model. This process will not stop until the update of parameter
is tiny enough. C. CIPHERTEXT GRADIENT UPDATE
Now [·] is used to denote the result of homomorphic encryp-
tion of plaintext data. So far, the fully homomorphic encryp-
tion cannot be used to directly calculate Sigmoid function.
Literature [29] presented the fitting polynomial of Sigmoid
function. After the model parameter G is encrypted by CKKS,
the fitting polynomial in ciphertext can be used as:
3
[G] xi [G] xi
[σ (Gxi )] = a0 + a1 + a3 . (5)
8 8
(a0 , a1 , a3 ) ≈ (0.5, 1.20096, −0.81562).
According to equ. (4) and equ. (5), we can obtain:
N
∂L 1 X
= ([σ (Gxi )] − yi ) xij . (6)
∂Gj N i=1
The model parameter update formula in ciphertext is:

FIGURE 1. Federated learning.
∂L
[G] = [G] − α . (7)
∂G
VOLUME 11, 2023 3
IV. PROPOSED FRAMEWORK C. SPECIFIC PROCESS

In this section, we first preset system model and basic as-
sumption of our proposed framework MS-FL, then present
MS-FL on how to accomplish model transaction and mitigate
the poisoning attacks in detail. For ease of reference, the sym-
bols that appeared in below and corresponding descriptions
are listed in TABLE I.
A. SYSTEM MODEL
There are three basic entities in our system (Fig. 3):
• Model requestor: model requestors do not have data but
need model, they are also the initiator of the machine
learning.
• Data owner: data owners have data. Some of them are
malicious, who will poison in the model training. Others FIGURE 3. Proposed Framework.
are honest but curious. The process of the MS-FL and corresponding algorithm are
• Service provider (SP): Service provider is responsible demonstrated in below.
to receive all gradients submitted by data owners and
aggregate them. It will add noise in the gradients to step 1 The model requestor uses Algorithm 1 to encrypt
protect data owner’s privacy. It is also honest but curious. the model parameters with the public key and sends
them to the service provider (Enc(z, pk) → v denotes
that plaintext z is encrypted to ciphertext v by CKKS
B. BASIC ASSUMPTION
public key pk).
1) Data samples of each data owner are IID(independent
step 2 The service provider receives the encrypted model
and identically distributed).
parameters and sends them to data owners.
2) Aggregation server is an honest and curious node.
step 3 The data owners will deliver the updated model gra-
3) There are three types of attacks that malicious nodes
dients to service provider after executing Algorithm 2.
can launch: label-flipping attack, backdoor attack and
step 4 The service provider adds noise into model gra-
arbitrary model poisoning.
dients by Algorithm 3 to protect the privacy of data
4) The communication between any two members of this
owners, then send them to model requestor.
system is secure and reliable.
step 5 Upon receiving encrypted gradients containing
noise, the model requestor initiates the process of de-
TABLE 1. SYMBOLS cryption by utilizing its private key. Once the gradients
have been decrypted, the model requestor employs
Symbols Meaning
Algorithm 4 (Dec (v, sk) → z denotes that ciphertext v
n The number of model parameters is decrypted to plaintext z by CKKS private key sk) to
t The number of data owners facilitate the selection of t −f data owners’ parameters
D Data set for aggregation with respect to each gradient compo-
y Label nent containing noise.
Y Label matrix(m × 1) Following the successful completion of selection,
the model requestor proceeds to create a table like
X Data matrix(n × m)
Fig. 3 that contains the addresses of the data owners
α Learning rate
who participated in aggregating each gradient compo-
gj The jth component of model parameter nent. This table serves as a record of the contributions
G Model parameter vector G = (g1 , . . . , gn ) made by each data owner, and enables the model
f The number of malicious nodes requestor to keep track of the overall progress of the
m Batch size aggregation process.
γ Global model learning rate
Finally, the model requestor initiates the creation
of a smart contract that comprises the aforementioned
β k = bk1 , bk2 , . . . , bkn

Model gradient updated k times
table and some cryptocurrency. This smart contract
βik The ith data owner’s gradient serves as a mechanism for ensuring that all partici-
βijk The jth component of βik pants in the aggregation process are fairly compen-
sated for their contributions. Once the smart contract
has been created, the model requestor transmits the
address of the contract to service provider, thereby
4 VOLUME 11, 2023
completing the overall process of gradient aggrega- Algorithm 1 Model Encryption

tion. Input: pk, n, G;
step 6 Upon receipt of the smart contract, the service Output: [G];
provider assumes a pivotal role in facilitating the suc- 1: for int i=1 to n do
cessful execution of the gradient aggregation process. 2: [gj ] ← Enc(gj ; pk) (In the first round of training, gj
To this end, the service provider commences a series is generated by model requestor randomly);
of rigorous checks and balances, designed to ensure 3: end for
the accuracy, reliability, and fairness of the overall 4: Send [G] = ([g1 ] , . . . , [gn ]) to service provider;
process.
The first step in this process involves a thorough Algorithm 2 Local Training
examination of the amount of cryptocurrency con-
Input: [G], D, m, α;
tained within the smart contract. If the amount of
Output: βik ;
cryptocurrency is found to be insufficient to meet the
1: Randomly select m data from data set D to constitute data
required amount stipulated by the data owners, the
matrix X ∈ Rn×m and label matrix Y ∈ R1×m .
service provider refrains from transmitting the no-
y1 , . . . , eym ) ← [G] × X .
2: (e
noise encrypted gradient to the smart contract. This
3: for int i=1 to m do
step is crucial in ensuring that the interests of all par-
4: [eyi ] ← [σ (eyi )]. (equation 5)
ties involved in the gradient aggregation process are
5: end for
adequately protected, and that the process proceeds in
6: for int j=1 to n do
a transparent and equitable manner. m
bj ← m1
k P
Once the cryptocurrency contained within the smart 7: ([eyi ] − yi ) xji .
i=1
contract has been verified, the service provider moves end
8: k for k k
on to the next stage of the process, which involves the

9: βi ← b1 , b2 , . . . , bkn .
selection of gradient components that are devoid of 10: The ith data owner send βik to service provider.
noise for aggregation. This selection process is guided
by the table provided by the model requestor, which
contains relevant information regarding the contribu-
the model. Therefore, in the rest of this section, we provide
tions made by the participating data owners.
a hybrid argument that relies on simulators so as to further
Having identified the most suitable gradient compo-
demonstrate that during the execution of MS-FL protocols,
nents for aggregation, the service provider initiates the
model requestor and service provider cannot learn nothing
process of aggregation, utilizing Algorithm 5 to obtain
about data owners’ privacy.
the encryped aggregation result. Upon completion of
the aggregation process, the service provider transmits Theorem V.1. Given a parameter p, any subset of data
the result to the account of the model requestor via the owners U and C = {model requestor, service provider}, let
,p
smart contract. REAL U C be a random variable representing the joint view of
step 7 In the last step, the cryptocurrency contained within parties in C in the real execution of above protocols. There
the smart contract is equally distributed among the exists a probabilistic polynomial-time (PPT) simulator SIM
corresponding account addresses of the participating such that the output of SIM is computationally indistinguish-
,p
data owners on the table, thereby ensuring that all able from REAL U C .
parties involved in the gradient aggregation process ,p
are fairly compensated for their contributions. This Proof. According to the definition of REAL U C it consists of
step is crucial in maintaining the trust and goodwill all internal state and messages received by the parties in C
of the data owners, and in ensuring that they remain during the execution of the protocols. We adopt the standard
committed to the process of gradient aggregation. hybrid argument used in literatures [30] and [31] to prove this
proposition. For example, given the security parameters k, we
Finally, the model requestor processes the received aggre-
define a PPT simulator SIM through a series of (polynomially
gation outcomes in preparation for the next iteration, utilizing
many) subsequent modifications to the random variables in
Algorithm 6 to facilitate the necessary computations. ,p
REAL U C so that the output of SIM is computationally indis-
,p
V. SECURITY ANALYSIS tinguishable from REAL U C . The detailed proof is described
The function privacy and semantic security of the CKKS below.
against chosen-plaintext attacks (or IND-CPA security for Hyb 1 Initialize a random variable whose distribution is in-
,p
short) [3] guarantee the strong security of the protocol. Be- distinguishable from REAL U C .
cause the plaintext of model will not appear in the joint view Hyb 2 In this hybrid, we change the behavior of an honest
of data owners and service provider as well as the IND-CPA data owner, so that this data owner encrypts a ran-
security of CKKS, it is impossible that data owners or service domly selected vector
η = (η1 , η2 , . . . , ηn ) instead of
provider have opportunity to obtain any information about the gradient βxk send to service provider. Since only
VOLUME 11, 2023 5
Algorithm 3 Privacy Protection Algorithm 5 Model Transaction

t t
Input: βik i=1 ; Input: βik i=1 ;
n ot
Output: d βik
t
( βik i=1 with noise); Output: β k+1 ;
i=1 1: for int j=1 to n do
1: Generate n pairs of non-zero random numbers t
2: Add t − f values in the set βijk i=1 corresponding
(cj , dj ) , j = 1, 2, . . . , n.
to the accounts in
the newly created smart contract to get
2: for int j=1 to n do
the result β k+1
3: for hint i=1
i to t do 3: end for
βij ← cj βijk + dj . (add noise)
k
d
4: 4: Send β k+1 to smart contract.
5: end for
6: end for n ot
Algorithm 6 Model Updating
7: Send
dβik to model requestor.
i=1 Input: β k+1 , sk, γ, t − f ;
Output: G ;
Algorithm 4 Aggregation Algorithm 1: β k+1 ← Dec( β k+1 ; sk).
2: G ← G − γ/(t − f ) × β k+1 .
n ot
Input: d βik , sk ;
i=1
Output:
n oselected gradient
n components;
ot
t
k
1: βc
i = Dec( d βk ; sk).
i
i=1 i=1 the same as the previous one and the perturbed param-
2: for int j=1 nto nodo eters are also uniformly random, this hybrid here and
t
3: assign β ck to the list {b1 , b2 , . . . , bt } in order. the previous one is sampled from identical distribu-
ij
i=1
4: for int i=1 to t do tion, i.e., uniformly random. Besides, the IND-CPA
5: Pick t − f − 1 points that have the smallest dis- security property of CKKS, as well as the two non-
tance from bi in the set {b1 , b2 , . . . , bt }, which constitute collusive model requestor and service provider setting
guarantees that this hybrid is indistinguishable from

b1∗ , b2∗ , . . . , b(t−f −1)∗ , suppose the distance set be-
the previous one.
n o
tween these points and bi is hi1∗ , hi2∗ , . . . , hi(t−f −1)∗ .
Hyb 4 In this hybrid, we change the input of algorithm 3 with
6: for int j=1 to t-f-1 do k
7: if bj∗ < bi then η̂ instead of [βd
x ], and simulate model requestor to take
8: hij∗ ← −hij∗ . η̂ into aggregation. The IND-CPA security property
9: end if of CKKS, as well as the two non-collusive model
10: end for requestor and service provider setting guarantees that
P−1 i
t−f this hybrid is indistinguishable from the previous one.
11: b′i ← hj∗ . Hyb 5 In this hybrid, we simulate service provider delivers
j=1
12: end for aggregation result [β k ] including η instead of β ck to
x
t
13: Select the smallest b′i in the set {b′i }i=1 and then model requestor. This hybrid is indistinguishable from
choose t − f data owners as candidates whose gradient the previous one because model requestor can only get
component is or is closest to bi in the set {b1 , b2 , . . . , bt }. equations
14: end for 
15: Take the account addresses of candidates for each com- cj βijk + dj = β
 ck (j = 1, 2, . . . , n; i = 1, 2, . . . , t)
ij
ponent of gradient to establish the smart contract on t−f
P k t−f ,
βikp j
P c
blockchain. cj
 βip j + (t − f ) d j =
p=1 p=1
16: Put cryptocurrency into smart contract. (8)
t
which is unable to figure out besides the βik i=1 ,
IND-CPA security property of the CKKS.
the contents of the ciphertexts are changed, the IND-
CPA security property of CKKS as well as the two
non-collusive model requestor and service provider
setting guarantees that this hybrid is indistinguishable The argument proves that there is a simulator SIM sampled
from the previous one. from the distribution described above so that its output is
Hyb 3 In this hybrid, we simulate the service provider to computationally indistinguishable from the output of REAL.
perturb the ηi by the noise ci and di , model requestor Hence, the frame holds the security property that model re-
can only get ηbi = ci ηi + di . It is well known that the questor and service provider will learn nothing about the data
parameters added by uniformly random numbers are owners’ private data and other participants cannot obtain the
also uniformly random. Since the distribution of the real model except model requestor.
noise ci and di are uniformly random which is exactly
6 VOLUME 11, 2023
VI. CONVERGENCE ANALYSIS Fix any α ∈ (q/k, 1/2) and any δ > 0 such that δ ≤
In this section, we provide the convergence analysis of MS- α−q/k and log(1/δ) = O(d). There exist universal positive
FL. For a specific FL task, the optimal global model G∗ can constants c1 , c2 such that if
be obtained by solving the following optimization problem, N
k ≥ c1 Cα2 d log(N /k),
minG E(x,y)∼De L(x, G, y),
then with probability at least
where L(x, G, y) is the experical loss function and D
e is the dis-
1 − exp(−kD(α − q/k∥δ)),
tribution of the training data. Follow the analysis of Theorem
′ ′
5 in article [12], we can derive that, the difference between where D (δ ′ ∥δ) = δ ′ log δδ + (1 − δ ′ ) log 1−δ
1−δ denotes the

the global model learnt by our robust aggregation algorithm binary divergence, the iterates {Gt } with η = L/ 2M 2
under attacks and the optimal global model G∗ is bounded. satisfy
To prove Theorem VI.2, we list corresponding lemma and ∗
assumptions. Because the lemma have been proven in [12], q ∥G
t
t −G ∥≤
q
we omit the proof of lemma here. L2
1
2 + 1
2 1 − 4M 2 ∥G0 − G∗ ∥ + c2 dk
N , ∀t ≥ 1.
Assumption VI.1. The population risk function F : G →
R is L-strongly convex, and differentiable over G with M- In the above lemma, based on the description of article
Lipschitz gradient. That is, for all G, G′ ∈ G, [12], m represents the number of users participating in FL
and these users are divided into k parties. In this paper, for
2
F (G′ ) ≥ F(G) + ⟨∇F(G), G′ − G⟩ + L
2 ∥G′ − G∥ , robustness of FL, we set k = m. In addition, δ can be viewed
as the expected fraction of batches that are “statistically bad”;
and
the larger the batch sample size N /k (comparing to d), the
∥∇F(G) − ∇F (G′ )∥ ≤ M ∥G − G′ ∥. smaller δ. In this paper, we temporarily assume that there is
no "statistically bad” data batch except Byzantine nodes. So
Under Assumption VI.1, it is well-known [32] that using
the probability
the standard gradient descent update
Gt = Gt−1 − η × ∇F (Gt−1 ), 1 − exp(−kD(α − q/k∥δ))
where η is some fixed stepsize, Gt approaches G∗ exponen- can be regarded as 1.

tially fast. In particular, it holds that Based on above lemma and assumptions, we can obtain the
following theorem.
Lη t/2
∥Gt − G∗ ∥ ≤ 1 − 2M ∥G0 − G∗ ∥.

2
Theorem VI.2. Suppose ν is the probability that the com-
Assumption VI.2. There exist positive constants σ1 and α1 ponent of global model Gt of MS-FL does not fall within
such that for any unit vector v ∈ B, ⟨∇f (X , G∗ ) , v⟩ is sub- the suitable interval of the component of benign nodes. With
exponential with scaling parameters σ1 and α1 , i.e., probability at least 1 − ν, the difference between Gt and G∗
2 2 satisfies:
supv∈B E [exp (λ ⟨∇f (X , G∗ ) , v⟩)] ≤ eσ1 λ /2
, ∀|λ| ≤
1
α1 , ∥G − G∗ ∥ ≤
"
q t t #
where B denotes the unit sphere {G : ∥G∥2 = 1}. Lη Lη t/2
1 1
∥G0 − G∗ ∥ +

2 + 2 1 − 4M 2 + 2 1 − 4M 2
Assumption VI.3. There exist positive constants σ2 and α2 q
such that for any G ∈ G with G ̸= G∗ and unit vector c2 dm
N .
v ∈ B, ⟨h(X , G) − E[h(X , G)] is sub-exponential with scaling
Proof. Because the data set is IID, the gradient vectors of
parameters (σ2 , α2 ), i.e., for all |λ| ≤ α12 ,
bengin nodes present Gaussian distribution. We denote the
bti_min and bti_max are the minimum value and max value of i-th
h i 2 2
supG∈G,v∈B E exp λ⟨h(X ,G)−E[h(X
∥G−G∗ ∥
,G)],v⟩
≤ eσ2 λ /2
component of gradient of all bengin nodes in the t-th iteration.
Assumption VI.4. For any δ ∈ (0, 1), there exists an M ′ = obviously, the i-th component
of bengin gradients must be in
M ′ (n, δ) that is non-increasing in n such that the interval bti_min , bti_max but malicious nodes will not.
When the number of malicious nodes in the network is less
∥ 1 n (∇f (Xi ,G)−∇f (Xi ,G′ ))∥
P
P supG,G′ ∈G:G̸=G′ n i=1 ∥G−G′ ∥ ≤ M′ ≥ than the number of benign nodes, the malicious nodes exer-
cise the worst attack over the aggregation process, causing
1 − 3δ .
the
t resulting model
to be positioned at the edge of the interval
Lemma VI.1. Suppose that Assumption VI.1 – Assump- bi_min , bti_max for each component of the gradient if we select
tion VI.4 hold such that L, M , σ1 , σ2 , α1 , α2 , are all the method of geometric median. We use Gt_geo as model
′
G (1) and∗ log M√ = O(log d). Assume that G ⊂
of aggregation result that is most affected by malicious nodes,
G : ∥G − G ∥ ≤ r d for some positive parameter r such which is obtained by geometric median method. However,
that log(r) = O(d log(N /k)), and 2(1 + ϵ)q ≤ k ≤ m. according to Lemma VI.1, the difference between the global
VOLUME 11, 2023 7
model Gt_geo and the optimal global model G∗ can still be 1) Iteration/Training: Each data owner executes a round of
bounded. gradient descent using their local data.
we suppose that the result of our aggregation algorithm τit 2) Selection/Weight Assignment: The aggregation algo-
is the i-th component of gradient in the t-th iteration. Because rithm is utilized to either select subsets of each gradient
all users can not obtain specific distribution and value of component from data owners for aggregation or to assign
their gradients in our framework, under the three poisoning different weights to their gradients.
attacks proposed in this paper, we can obtain the following 3) Aggregation: The gradient data processed by previous
probabilistic relationship step is averaged to arrive at the aggregated gradient.
4) Testing: After every ten rounds of training, the accuracy
P bti_min ≤ τit ≤ bti_max ≫ P τit ∈/ bti_min , bti_max , of the updated parameters is tested using the test set.
and This process is repeated in a loop to compare the accuracy
n of MS-FL with other algorithms.
P τit ∈
/ bti_min , bti_max .
P
ν= In the experiment compared with PEFL, we mainly focus
i=1 on three poisoning attacks: label-flipping attack, backdoor
Ḡt_b denotes the average parameter of benign nodes in t-th attack and model poisoning attack. To simulate the label-
iteration. Based on the aforementioned lemma and assump- flipping attack, we change labels in malicious data owners’
tions, when the number of malicious nodes is less than that dataset to ŷ = (y + 1)mod10. To reproduce the backdoor
of honest nodes, we have the following equations for the t-th attack, we put a trigger into the images of malicious data
global iteration with probability at least 1 − ν, owners (like Fig. 4(a) and 4(b)) as well as images of test
set. To verify the vulnerability of PEFL for model poisoning
∥Gt − G∗ ∥ ≤ ∥Gt − Ḡt_b + Ḡt_b − G∗ ∥ ≤ ∥Gt − Ḡt_b ∥ + attack, we only select one node as malicious participant,
∥Ḡt_b − G∗ ∥ ≤ ∥Gt_geo − Ḡt_b ∥ + ∥Ḡt_b − G∗ ∥ ≤ whose each gradient component adds a same arbitrary value
∥Gt_geo − G∗ + G∗ − Ḡt_b ∥ + ∥Ḡt_b − G∗ ∥ ≤ in the interval [−6000, 6000] after each round of iteration.
" ∥Gt_geo − G∗ ∥ + 2 ∥Ḡt_b − G∗#∥ =
q t
Lη Lη t/2
1 1
∥G0 − G∗ ∥ +

2 + 2 1 − 4M 2 + 2 1 − 4M 2
q
c2 dmN .
VII. PERFORMANCE EVALUATION

In this section, we conduct experiments with real-world (a) MNIST Backdoor (b) Fashion-MNIST Backdoor
datasets to evaluate the performance of the aggregation algo-
rithm proposed in this paper. All experiment run in a Lenovo FIGURE 4. Backdoor attack simulated in this paper
server with i5-6900hq CPU and 8G RAM.
In the experiment compared with trimmed mean and ge-
A. DATASET ometric median, to test the robustness of our aggregation
We implemented experiments under two typical real datasets, algorithm for arbitrary model attack, malicious nodes (40%)
which are MNIST and Fashion-MNIST. The MNIST dataset generate random value in [0, 10000] as upgraded gradients.
is used for handwritten digit recognition including 60,000 When the trimmed mean used to aggregate gradients, we have
training images and 10,000 test images. The grayscale of to cut off 40% values on both sides of the median of each
these images has been normalized to 28 × 28 pixels. Just gradient component to ensure the accuracy of model. Because
like MNIST, the Fashion-MNIST dataset also consists of ten the test set of MNIST and Fashion-MNIST is not enough
classes of images, including fashion items such as trousers, to observe distinction between MS-FL and the other two
sandals, bags, et al. algorithms, we use training set as test set in this experiment.
Hyper-parameters: The total number of data owners we
B. EXPERIMENT SETUP used in the experiment is 10, and the maximum number of
The dataset is partitioned into ten subsets, with each subset malicious nodes is 4. When we use MS-FL to aggregate
assigned to a different data owner. Each data owner performs gradients, we set f = 4 regardless of the real number of ma-
logistic regression algorithm to complete machine learning licious data owners. For each round of parameters updating,
locally. To simulate malicious nodes, a portion of these sub- we set the batch size of 1000 and the global learning rate of
sets is selected and designated as such. In order to expedite γ = 10−5 in MNIST scenario and γ = 10−6 in Fashion-
the testing of the aggregation algorithm, the experiment is MNIST scenario.
conducted using plain text, and is divided into four distinct
stages: C. DISCUSSION OF EXPERIMENT RESULT
8 VOLUME 11, 2023
1) Effect of Different Proportions of Poisoners than 60% of honest participants is sufficient for the model to
have good accuracy in MNIST scenario and Fashion-MNIST
scenario as well as the robustness of our algorithm and PEFL.
In addition, no matter in which application scenario or attack,
MS-FL can still maintain a comparative advantage compared
with PEFL. We attribute it to the characteristic of MS-FL,
which aggregates gradient components closer to the median
of benign data as much as possible but not the median of all
data.
2) Effect of Different Iterations (label-flipping attack and

backdoor attack)
(a) MNIST label-flipping attack. In Fig. 6, we illustrate the comparison of accuracy executed
by PEFL and MS-FL with 40% malicious nodes from the
200th iteration to 300th iteration.
With an increase in the number of iterations, the model
becomes progressively more adept at extracting effective data
features. Consequently, the test accuracy of the model also
improves, until the model ultimately achieves convergence.
Here, we ignore the problems of under-fitting and over-fitting
caused by too small or large model capacity. According to
Fig. 6, we notice that model accuracy of PEFL is lower than
MS-FL and the accuracy variation of the model has been
(b) MNIST backdoor attack.
nearly stable. Besides, in Figs. 6(c) and 6(d), the accuracy
of MS-FL is nearly equivalent to that of the model without
poisoner, which means that malicious data has less effect on
MS-FL.
3) Effect of different iterations (model poisoning attack)

In the experiment of Figs. 7(a) and 7(b), there is only one ma-
licious node mentioned in subsection VII-B but the PEFL’s
model accuracy fails to stabilize, which means that it is
vulnerable for PEFL to respond to model poisoning attack.
(c) Fashion-MNIST label-flipping attack.
In the meanwhile, the malicious node with extreme value has
almost no effect on MS-FL.
Figs. 7(c) and 7(d) demonstrates the difference of model
accuracy among trimmed mean, geometric median and MS-
FL when malicious nodes occupy 40% in all data owners. No
matter in MNIST scenario or Fashion-MNIST scenario, MS-
FL provides better accuracy than the other two algorithms.
This is because trimmed mean and geometric median elim-
inate too much valuable data away from the median of data
owners’ gradient component to resist the effect of malicious
nodes and MS-FL does not rely on the median of data owners.
(d) Fashion-MNIST backdoor attack.
VIII. CONCLUSION
FIGURE 5. Comparison of accuracy with different number of poisoners. In this paper, a novel federated learning framework MS-
FL based on multiple security strategies has been proposed
Fig. 5 outlines the comparison of test accuracy at the 300th for data application scenario, in which the model requestor
iteration with different proportion of poisoners. It is well and the data owner are separated. This framework has been
known that less benign data owners in training brings about proved secure for privacy of model requestor and data owners.
lower model accuracy because as the proportion of poisoners Moreover, it is capable of protecting interests of participants
increases, the amount of valuable data in the training process of FL and the aggregation algorithm of MS-FL has ability
decreases, which leads to a drop in model accuracy. However to defend typical byzantine poisoning attacks. At the end of
it is not apparent in Fig. 5 and the reason is that no less this paper, the experimental results demonstrate comparable
VOLUME 11, 2023 9
(a) MNIST label-flipping attack. (a) MNIST.
(b) MNIST backdoor attack. (b) Fashion-MNIST.
(c) Fashion-MNIST label-flipping attack. (c) MNIST.
(d) Fashion-MNIST backdoor attack. (d) Fashion-MNIST.
FIGURE 6. Comparison of accuracy with different iterations (data FIGURE 7. Comparison of accuracy with different iterations (model
poisoning attack). poisoning attack).
10 VOLUME 11, 2023
performance of aggregation algorithm of MS-FL in terms of slice orchestration architecture. IEEE Transactions on Network and Service
accuracy and robustness. Management, 19(1):188–202, 2022.
[19] Yuhan Song, Fushan Wei, Kaijie Zhu, and Yuefei Zhu. Anomaly detection
as a service: An outsourced anomaly detection scheme for blockchain in
REFERENCES a privacy-preserving manner. IEEE Transactions on Network and Service
Management, 19(4):3794–3809, 2022.
[1] Shuangjie Bai, Geng Yang, Guoxiu Liu, Hua Dai, and Chunming Rong.
[20] Kyoohyung Han, Seungwan Hong, Jung Hee Cheon, and Daejun Park. Lo-
Nttpfl: Privacy-preserving oriented no trusted third party federated learning
gistic regression on homomorphic encrypted data at scale. In Proceedings
system based on blockchain. IEEE Transactions on Network and Service
of the AAAI conference on artificial intelligence, volume 33, pages 9466–
Management, 19(4):3750–3763, 2022.
9471, 2019.
[2] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion [21] Chaochao Chen, Jun Zhou, Li Wang, Xibin Wu, Wenjing Fang, Jin Tan,
attacks that exploit confidence information and basic countermeasures. Lei Wang, Alex X Liu, Hao Wang, and Cheng Hong. When homomor-
In Proceedings of the 22nd ACM SIGSAC conference on computer and phic encryption marries secret sharing: Secure large-scale sparse logistic
communications security, pages 1322–1333, 2015. regression and applications in risk control. In Proceedings of the 27th
[3] Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homo- ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages
morphic encryption for arithmetic of approximate numbers. In Advances 2652–2662, 2021.
in Cryptology–ASIACRYPT 2017: 23rd International Conference on the [22] Hao Chen, Ran Gilad-Bachrach, Kyoohyung Han, Zhicong Huang, Amir
Theory and Applications of Cryptology and Information Security, Hong Jalali, Kim Laine, and Kristin Lauter. Logistic regression over encrypted
Kong, China, December 3-7, 2017, Proceedings, Part I 23, pages 409–437. data from fully homomorphic encryption. BMC medical genomics, 11:3–
Springer, 2017. 12, 2018.
[4] Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien [23] Jung Hee Cheon, Duhyeong Kim, Yongdai Kim, and Yongsoo Song.
Stainer. Machine learning with adversaries: Byzantine tolerant gradient Ensemble method for privacy-preserving logistic regression based on ho-
descent. Advances in neural information processing systems, 30, 2017. momorphic encryption. IEEE Access, 6:46938–46948, 2018.
[5] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy [24] Yuzheng Li, Chuan Chen, Nan Liu, Huawei Huang, Zibin Zheng, and
analysis of deep learning: Passive and active white-box inference attacks Qiang Yan. A blockchain-based decentralized federated learning frame-
against centralized and federated learning. In 2019 IEEE symposium on work with committee consensus. IEEE Network, 35(1):234–241, 2020.
security and privacy (SP), pages 739–753. IEEE, 2019. [25] Yang Zhao, Jun Zhao, Linshan Jiang, Rui Tan, Dusit Niyato, Zengxiang
[6] Yifeng Jiang, Weiwen Zhang, and Yanxi Chen. Data quality detection Li, Lingjuan Lyu, and Yingbo Liu. Privacy-preserving blockchain-based
mechanism against label flipping attacks in federated learning. IEEE federated learning for iot devices. IEEE Internet of Things Journal,
Transactions on Information Forensics and Security, 18:1625–1637, 2023. 8(3):1817–1829, 2020.
[7] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and [26] Weishan Zhang, Qinghua Lu, Qiuyu Yu, Zhaotong Li, Yue Liu, Sin Kit Lo,
Vitaly Shmatikov. How to backdoor federated learning. In International Shiping Chen, Xiwei Xu, and Liming Zhu. Blockchain-based federated
Conference on Artificial Intelligence and Statistics, pages 2938–2948. learning for device failure detection in industrial iot. IEEE Internet of
PMLR, 2020. Things Journal, 8(7):5926–5937, 2020.
[8] Chang Xu, Yu Jia, Liehuang Zhu, Chuan Zhang, Guoxie Jin, and Kashif [27] Kentaroh Toyoda and Allan N Zhang. Mechanism design for an incentive-
Sharif. Tdfl: Truth discovery based byzantine robust federated learning. aware blockchain-enabled federated learning platform. In 2019 IEEE
IEEE Transactions on Parallel and Distributed Systems, 33(12):4835– international conference on big data (Big Data), pages 395–403. IEEE,
4848, 2022. 2019.
[9] Zewei Liu, Chunqiang Hu, Hui Xia, Tao Xiang, Baolin Wang, and Jiajun [28] Ahmad Hammoud, Hadi Otrok, Azzam Mourad, and Zbigniew Dziong.
Chen. Spdts: a differential privacy-based blockchain scheme for secure On demand fog federations for horizontal federated learning in iov. IEEE
power data trading. IEEE Transactions on Network and Service Manage- Transactions on Network and Service Management, 19(3):3062–3075,
ment, pages 5196–5207, 2022. 2022.
[10] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. Deep models [29] Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, Xiaoqian Jiang,
under the gan: information leakage from collaborative deep learning. In et al. Secure logistic regression based on homomorphic encryption: Design
Proceedings of the 2017 ACM SIGSAC conference on computer and com- and evaluation. JMIR medical informatics, 6(2):e8805, 2018.
munications security, pages 603–618, 2017. [30] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone,
[11] Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn
Moeller. Inverting gradients-how easy is it to break privacy in federated Seth. Practical secure aggregation for privacy-preserving machine learning.
learning? Advances in Neural Information Processing Systems, 33:16937– In proceedings of the 2017 ACM SIGSAC Conference on Computer and
16947, 2020. Communications Security, pages 1175–1191, 2017.
[12] Yudong Chen, Lili Su, and Jiaming Xu. Distributed statistical machine [31] Guowen Xu, Hongwei Li, Yun Zhang, Shengmin Xu, Jianting Ning, and
learning in adversarial settings: Byzantine gradient descent. Proceedings Robert H Deng. Privacy-preserving federated deep learning with irreg-
of the ACM on Measurement and Analysis of Computing Systems, 1(2):1– ular users. IEEE Transactions on Dependable and Secure Computing,
25, 2017. 19(2):1364–1381, 2020.
[13] Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. [32] S. Boyd and L. Vandenberghe. Convex Optimization. Convex Optimiza-
Byzantine-robust distributed learning: Towards optimal statistical rates. In tion, 2004.
International Conference on Machine Learning, pages 5650–5659. PMLR,
2018.
[14] Rachid Guerraoui, Sébastien Rouault, et al. The hidden vulnerability of
distributed learning in byzantium. In International Conference on Machine
Learning, pages 3521–3530. PMLR, 2018.
[15] Luis Muñoz-González, Kenneth T Co, and Emil C Lupu. Byzantine-robust
federated machine learning through adaptive model averaging. arXiv WENSHAO YANG received his Bachelor of Sci-
preprint arXiv:1909.05125, 2019.
ence degree from Nanchang University in Nan-
[16] Xiaoyuan Liu, Hongwei Li, Guowen Xu, Zongqi Chen, Xiaoming Huang,
chang, China in 2020. He is currently pursuing his
and Rongxing Lu. Privacy-enhanced federated learning against poisoning
master’s degree in the Department of Mathematics
adversaries. IEEE Transactions on Information Forensics and Security,
16:4574–4588, 2021. at Yanshan University. His research interests in-
[17] Xu Ma, Xiaoqian Sun, Yuduo Wu, Zheli Liu, Xiaofeng Chen, and Changyu clude the fields of information security, federated
Dong. Differentially private byzantine-robust federated learning. IEEE learning and blockchain.
Transactions on Parallel and Distributed Systems, 33(12):3690–3701,
2022.
[18] Guobiao He, Wei Su, Shuai Gao, Ningchun Liu, and Sajal K. Das.
Netchain: A blockchain-enabled privacy-preserving multi-domain network
VOLUME 11, 2023 11
PENGFEI KANG received his Bachelor of Science

degree from Yanshan University in Qinhuangdao,
China in 2021. He is currently pursuing his mas-
ter’s degree in the Department of Mathematics at
Yanshan University. His research interests include
the fields of blockchain and game theory, with a
current focus on blockchain and its applications.
CHAO WEI (IEEE member) received the doctor’s

degree in School of Computer science from To-
hoku University, Japan, in 2015. He is Associate
professor in School of Science, Yanshan Univer-
sity. His research interest includes blockchain, net-
work security and privacy protection.
12 VOLUME 11, 2023

MS-FL A Federated Learning Framework Based On Mult

Uploaded by

Copyright:

Available Formats

MS-FL A Federated Learning Framework Based On Mult

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MS-FL A Federated Learning Framework Based On Mult

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in IEEE Access.

MS-FL: A Federated Learning Framework Based

I. INTRODUCTION intends to inject poisoned (biased or misleading) data into a

VOLUME 11, 2023 1

II. RELATED WORK C. PROTECTION OF THE INTERESTS OF PARTICIPANTS

n We use stochastic gradient descent (SGD) to update model

The model parameter update formula in ciphertext is:

IV. PROPOSED FRAMEWORK C. SPECIFIC PROCESS

completing the overall process of gradient aggrega- Algorithm 1 Model Encryption

Algorithm 3 Privacy Protection Algorithm 5 Model Transaction

where η is some fixed stepsize, Gt approaches G∗ exponen- can be regarded as 1.

VII. PERFORMANCE EVALUATION

8 VOLUME 11, 2023

2) Effect of Different Iterations (label-flipping attack and

3) Effect of different iterations (model poisoning attack)

(a) MNIST label-flipping attack. (a) MNIST.

(b) MNIST backdoor attack. (b) Fashion-MNIST.

(c) Fashion-MNIST label-flipping attack. (c) MNIST.

(d) Fashion-MNIST backdoor attack. (d) Fashion-MNIST.

10 VOLUME 11, 2023

VOLUME 11, 2023 11

PENGFEI KANG received his Bachelor of Science

CHAO WEI (IEEE member) received the doctor’s

12 VOLUME 11, 2023

You might also like