0% found this document useful (0 votes)

21 views15 pages

刊出 IoT

Uploaded by

mingmys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views15 pages

刊出 IoT

Uploaded by

mingmys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

5466 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO.

6, JUNE 2020

Blockchain-Enabled Software-Defined
Industrial Internet of Things With Deep
Reinforcement Learning
Jia Luo , Qianbin Chen , Senior Member, IEEE, F. Richard Yu , Fellow, IEEE, and Lun Tang

Abstract—Recently, software-defined Industrial Internet of to meet the requirement of general public in the daily life
Things (SDIIoT), the integration of software-defined networking (e.g., smart home), IIoT is mainly used in the fields of manu-
(SDN) and Industrial Internet of Things (IIoT), has emerged. facture, transportation, logistics, and energy industry, making
It is perceived as an effective way to manage IIoT dynami-
cally. Aiming to improve the scalability and flexibility of SDIIoT, it require a higher level in security, robustness, flexibility, and
multi-SDN has been applied to form a physically distributed scalability. In order to facilitate the scalability and flexibility in
control plane to handle a large amount of data generated by resource management, a new paradigm called software-defined
industrial devices. However, as the core of multi-SDN, reach- IIoT (SDIIoT) has been proposed to integrate software-defined
ing consensus among multiple SDN controllers is a thorny issue. networking (SDN) into IIoT [2]. The SDN technology sepa-
To meet the required design principle, this article proposes a
blockchain-enabled distributed SDIIoT to synchronize local views rates the control plane and the data plane that are coupled
between distinct SDN controllers and finally reach the consensus together in traditional networks. As a result, the information of
of the global view. On the other hand, both the cryptographic certain area is collected by the corresponding SDN controller
operations of blockchain and the noncryptographic tasks have which belongs to the control plane, thus the SDN controller
access to the same computational resource pool of mobile edge can manage this area in a centralized way, facilitating the flex-
cloud (MEC). In order to optimize the system energy efficiency,
we adaptively allocate computational resources and the batch ibility and scalability for the IIoT. The original SDIIoT works
size of the block by jointly considering the trust features of under the premise that one central SDN controller has access
SDN controllers and the resource requirements of noncrypto- to the global information of the whole network. However,
graphic operations. To implement the truly distributed manner with the increasing network size, multiple SDN controllers are
of blockchain, we describe our problem as a partially observable required to manage the huge and widely distributed industrial
Markov decision process (POMDP) and propose a novel deep
reinforcement learning (DRL) approach to solve it. In the simu- devices. Hence, this trend transforms the classical centralized
lation results, we compare three different protocols of blockchain SDIIoT into a distributed paradigm where reaching consensus
and show the effectiveness of our scheme in each of them. among multiple SDN controllers is the core issue.
Index Terms—Blockchain, deep reinforcement learning (DRL), Consensus algorithms such as RAFT [3] defined in the
Industrial Internet of Things (IIoT), software-defined networking literature for distributed systems can be used to handle the con-
(SDN). sensus problem, however, these algorithms cannot guarantee
the consensus among multiple SDN controllers if malicious
nodes exist which is not acceptable for the SDIIoT requir-
I. I NTRODUCTION ing high security. As a promising technology, blockchain is

I N RECENT years, the explosive growth of smart devices considered as a viable solution to guarantee the system con-
and the demand for interconnection have fostered the rise of sensus even under malicious attacks. The blockchain, whose
the Internet of Things (IoT). As the implementation of IoT for concept originates from the cryptocurrency called Bitcoin [4],
industry, Industrial IoT (IIoT), motivated by the development has gained wide attention in recent years [5]. In general,
of wireless communications, sensor networks, and embedded blockchain is a distributed digital ledger, which is a series of
systems, has attracted great interests from both academia and data blocks generated by cryptography. Each block consists of
industry [1]. Compared to the Consumer IoT (CIoT) designed various transactions recording not just financial information
but virtually everything of value [6]. On this basis, all the
Manuscript received December 10, 2019; revised February 23, 2020; blockchain nodes have copies of the existing authenticated
accepted March 2, 2020. Date of publication March 5, 2020; date ledger distributed among them, thus to serve the purpose of
of current version June 12, 2020. This work was supported in part
by the National Natural Science Foundation of China under Grant providing a consensus about the transaction at any given time
61571073, and in part by the Science and Technology Research Program without a central authority. Any modification to a single trans-
of Chongqing Municipal Education Commission under Grant KJZD- action record requires the alternation of all subsequent records
M201800601. (Corresponding author: Qianbin Chen.)
Jia Luo, Qianbin Chen, and Lun Tang are with the Key Laboratory and the collusion of the whole network, making data in the
of Mobile Communication Technology, Chongqing University of Posts and blockchain tamper resistant. For the distributed SDIIoT, SDN
Telecommunications, Chongqing 400065, China (e-mail: cqb@cqupt.edu.cn). controllers can be integrated with the function of blockchain
F. Richard Yu is with the Department of Systems and Computer
Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada. to synchronize local views of distinct SDN controllers and
Digital Object Identifier 10.1109/JIOT.2020.2978516 finally get the global view of the whole network through the
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
2327-4662 ⃝
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: BLOCKCHAIN-ENABLED SDIIoT WITH DEEP REINFORCEMENT LEARNING 5467

process of reaching consensus. In addition to reaching consen- provided and discussed in Section VI. Finally, we conclude
sus, blockchain makes the information (stored in the blocks) this article in Section VII.
of SDIIoT tamper resistant and traceable thus to enhance the
security of the network. For instance, in the SDIIoT-based II. R ELATED W ORKS
logistics network, transportation information is shared and
stored by utilizing the blockchain technology. Due to the chain In this section, we briefly discuss some related works in
structure of blockchain, historical transportation information four areas: 1) SDIIoT; 2) distributed SDN; 3) blockchain; and
can be traced to check possible problems regarding trans- 4) DRL.
portation and blockchain can guarantee that this historical
A. Software-Defined Industrial Internet of Things
information cannot be tampered with.
As a further matter, computational resources are necessary In order to facilitate the scalability, ubiquitous accessibility,
for the corresponding operations in the consensus protocol and flexibility in resource management, a slew of literature has
of blockchain. To fulfill this requirement, SDIIoT can be integrated SDN into IIoT. For instance, Wan et al. [12] imple-
equipped with mobile edge cloud (MEC) which has been mented a software-defined industrial network to achieve the
proposed and studied in recent years [7]–[9]. Compared to dynamic management of manufacturing resources. Meanwhile,
the traditional cloud, MEC provides computational resources Moness et al. [13] proposed a hybrid software-defined
in close proximity to devices, thus to offer faster response time approach in IIoT. Chaudhary et al. [14] designed an SDN-
for services [10]. According to [11], the energy consumption enabled secure communication mechanism for the smart grid
of MEC is not negligible. Therefore, in this article, we propose in the IIoT environment. Besides, Bedhief et al. [15] offered
a distributed blockchain-enabled SDIIoT system and optimize the integration of SDN and fog computing into IIoT to provide
its energy efficiency to use the energy of MEC efficiently. Our a flexible solution granting low delays for IIoT applications.
contributions are listed as follows. However, the increase in the number of industrial devices
1) We integrate the permissioned blockchain into dis- has resulted in numerous data and flows in SDIIoT, requiring
tributed SDIIoT to reach the consensus of the global multiple SDN controllers for management. One of the essential
view and propose an optimization framework to issues in SDIIoT with multiple SDN controllers (also known
optimize the system energy efficiency. Specifically, we as distributed SDIIoT) is reaching consensus among multiple
adaptively allocate computational resources and the SDN controllers. Therefore, we will present some related
batch size of block via jointly considering the trust works in reaching consensus for distributed SDN architectures.
features of SDN controllers, as well as the resource
requirements of noncryptographic tasks in MEC. B. Distributed SDN Architectures
2) We design a novel method to add the state information
As a matter of fact, a bunch of works have been engaged
of the control plane to the local view which will be
in the design for distributed SDN architectures [16].
synchronized among multiple SDN controllers via the
Tootoonchian and Ganjali [17] proposed HyperFlow which
consensus protocol. Specifically, each SDN controller
is the first distributed SDN architecture for OpenFlow. SDN
will propose its local view to the consensus proce-
controllers keep synchronized in a consistent global view via
dure at every time slot to assist the distributed learning
a publish/subscribe messaging paradigm. Koponen et al. [18]
algorithm.
considered an Onix architecture, where the network
3) To the best of our knowledge, we are the first to use a
information base (NIB) is used to aggregate and share
partially observable Markov decision process (POMDP)
a global view. Moreover, Berde et al. [19] designed
to model such framework which optimizes the system
the ONOS, an experimental distributed SDN framework
energy efficiency in blockchain-enabled SDIIoT, thus
maintaining a global view via the consensus protocol in
providing the truly distributed implementation to fulfill
ZooKeeper [20]. On the other hand, the work in [21], [22],
the decentralization feature of blockchain.
and [23] proposed a two-level hierarchical control plane
4) We adopt the deep reinforcement learning (DRL) algo-
for improving scalability. Basically, they utilized a logically
rithm to solve the corresponding POMDP problem.
centralized root controller at the upper layer of the control
To be specific, we combine deep recurrent Q-network
hierarchy to handle the consensus issue at the lower layer.
(DRQN) with normalized advantage functions (NAFs)
Nevertheless, the above works have neglected the potential
to handle the partial observability and continuous action
influence inflicted by malicious attacks. To provide depend-
space in the problem. In the simulation results, we com-
able services to distributed architectures, Qiu et al. [24]
pare two different protocols of permissioned blockchain
and Liu et al. [25] integrated blockchain into SDIIoT and
and show the effectiveness of our scheme in either of
IIoT, then they deployed DRL to optimize the transactional
them.
throughput of blockchain.
The remainder of this article is organized as follows. We
review some related works in Section II. Then, we present the
network architecture, along with the analysis of the consensus C. Blockchain
protocol and the energy efficiency model in Section III, fol- Typically, the mainstream blockchain is classified as per-
lowed by the problem formulation in Section IV. Section V missionless blockchain and permissioned blockchain. To be
describes the DRL-based algorithm. The simulation results are specific, permissionless blockchain allows arbitrary entities

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5468 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

TABLE I
to join the network, participate in the process of reach- N OTATIONS
ing consensus. The most known examples are Bitcoin [4]
and Ethereum [26] networks where the underlying consensus
protocol is Proof of Work (PoW). However, permissionless
blockchain suffers from low throughput due to its consensus
protocol. On the contrary, permissioned blockchain restricts
the entities which can contribute to the consensus and adopts
a set of consensus protocols, such as practical Byzantine fault
tolerance (PBFT) [27], Aardvark [28], redundant Byzantine
fault tolerance (RBFT) [29], and Zyzzyva [30] whose origin
is the concept of Byzantine fault tolerance (BFT). Compared
to PoW, these consensus protocols result in higher throughput.

D. Deep Reinforcement Learning

On the other hand, Google DeepMind proposed the deep
Q network (DQN) [31] which made a significant contribution
to machine learning. As a result, this breakthrough has led to
the flourishing of DRL which copes with the Markov deci-
sion process (MDP) problem. Furthermore, some researchers
made modifications based on DQN to enable DRL for partic-
ular circumstances. Specifically, Hausknecht and Stone [32]
added recurrency to DQN to handle the POMDP problem,
while Gu et al. [33] changed the output of the neural network
(NN) to tackle the problem with continuous action space.
As the application of DRL, Wang et al. [34] adopted DRL
to adaptively allocate computing and network resources for
MEC. Meanwhile, in the field of network traffic control,
Fadlullah et al. [35] jointly utilized deep convolutional NN
(CNN) and deep belief network (DBN) to improve the packets
loss rate and throughput in the large scale network.

E. Discussions
Although DRL has been utilized to handle the optimization
problem in blockchain-enabled SDIIoT, a major issue in exist-
ing works is that they assume an agent can collect the states
of multiple entities in a centralized way thus to learn the pol-
icy. However, this is actually opposed to the decentralization
principle of blockchain. As a workaround, we use POMDP Fig. 1. Network architecture of blockchain-enabled SDIIoT.
to design a novel mechanism to optimize the system in a
decentralized method with modified DRL. We utilize permis- plane and forward them to the control plane. According to
sioned blockchain to reach the consensus in SDIIoT due to these collected data, the control plane thus can manage and
its advantage in performance and the nonpublic property of optimize the network. Specifically, the control plane is com-
SDIIoT. posed of single or multiple SDN controllers. We consider a
III. S YSTEM M ODEL distributed SDIIoT with N physical machines equipped with
SDN controller modules, denoted by N = {1, 2, . . . , N}. Each
In this section, we introduce the system model adopted in of them manages a certain domain of the network, and it
this article. We first present the network architecture, followed is equipped with a local MEC which has a computational
by the overview of blockchain, the description of the con- resource pool with ξtot CPU cycles per second.
sensus protocol, and the energy efficiency model. The main SDN controller can get the information of its domain which
notations used in the remainder of this article are summarized we define as the local view. Specifically, the local view mainly
in Table I. consists of the following two kinds of information.
1) Noncontrol Plane Information: Noncontrol plane
A. Network Architecture information is composed of the infrastructure plane
As shown in Fig. 1, due to the introduction of SDN, the con- information (e.g., transportation information in the logistics
trol plane and data plane are separated into two independent network and temperature information in the environmental
planes. The data plane mainly incorporates multiple switch monitoring network) and the data plane information (e.g.,
nodes which collect the devices’ data from the infrastructure topology information of switch nodes). The former needs to

TABLE II TABLE III

F ORMAT OF A B LOCK F ORMAT OF A T RANSACTION

information needed to be shared. It is formed by multiple

transactions and the corresponding control information.
be shared among different industrial devices of the underlying The formats of block and transaction are presented in
IIoT network to cater for various customers’ requirements, Tables II and III. By adopting the field “last block” in
e.g., in the logistics network, the transportation information is Table II, multiple blocks are strung chronologically as a
shared globally thus all nodes can track the transport process chain. Moreover, the cryptographic technique is used to
when needed. The latter is shared in order to implement authenticate transaction and block. Either digital signature
the cross-domain operation. An example is that topology or message authentication code (MAC) is adopted, and the
information of all domains is shared among SDN controllers detailed implementation varies in different consensus protocols
thus each SDN controller has knowledge of the whole of blockchain.
network. When an SDN controller needs to transmit data At each time slot, after collecting transactions from SDN
across different domains, it can utilize the shared topology controllers, one blockchain node initiates the process by creat-
information to calculate the best path for a particular kind of ing a block. Then, this block is verified by all the distributed
data, and then adopts a protocol like OpenFlow to instruct blockchain nodes around the system via the consensus pro-
the switch nodes to forward data. tocol. Once the block is verified as correct, it is added to
2) Control Plane Information: The control plane the end of the chain (i.e., the system digital ledger), which
information varies in different scenarios. In this article, is stored across the system. As a consequence, the new block
it refers to the state and action information of the SDN becomes available for anyone to view, and thus the correspond-
controller and its associated MEC (specified in Section IV). ing information is shared across the system. As the forerunner
Generally, in the single SDN controller network, the SDN of blockchain, Bitcoin utilizes this model for monetary trans-
controller collects noncontrol plane information to manage actions, however, the transaction included in the block can be
the network. Therefore, control plane information may not everything of value. On this basis, we use a local view to
be needed. However, our case is multiple SDN controllers construct the block in this article.
without a central authority. The distributed algorithm with
shared control plane information should be utilized to make C. Consensus Protocol
different SDN controllers work collaboratively. The consensus protocol is the core component of the dis-
According to the above discussion, local views should be tributed blockchain system. Due to the integration of SDIIoT
shared among different domains. We define the combination of and blockchain, multiple SDN controllers can get the global
all local views as global information (also known as a global view through the consensus process. The detailed procedure
view). To achieve the consistent global view of the network, varies in protocols. Below, we give the analysis of two con-
all the SDN controllers also act as a blockchain system where sensus protocols of permissioned blockchain. We assume the
each SDN controller is also the blockchain node. Therefore, a Byzantine failure model in which a finite number of faulty
single physical machine actually runs two main modules. For nodes can behave arbitrarily. These protocols allow at most
the sake of convenience, we use SDN controller n (n ∈ N ) F = ⌊(N − 1)/3⌋ blockchain nodes are faulty, which is the
and blockchain node n to represent corresponding modules of theoretical lower bound. A strong adversary can coordinate
the same physical machine n. faulty nodes to compromise the service. However, we also
Time is slotted as t ∈ {1, 2, . . .} and each time slot starts make a realistic assumption that the adversary is computa-
from the time SDN controller proposes its local view to the tionally bound thus it cannot subvert cryptographic techniques
blockchain node and terminates when all the SDN controllers like collision-resistant hashing, MAC, and digital signature.
get feedback with the corresponding validated block contain- We denote the message X encrypted with the corresponding
ing the consistent global view, namely, each time slot covers cryptographic techniques as follows [28].
a whole process of reaching consensus. 1) ⟨X⟩σi means that message X is signed by the private key
of node i.
B. Overview of Blockchain 2) ⟨X⟩µi,j means that message X is authenticated by node i
Blockchain, originally devised for the digital currency with a MAC appropriate for verification by node i and
Bitcoin, is a distributed and decentralized digital ledger. node j.
Specifically, each blockchain node contains a copy of the 3) ⟨X⟩µ⃗ i means that message X is authenticated by node i
system digital ledger consisting of multiple blocks strung with a MAC authenticator—an array of MACs appropri-
together. The block can be deemed as the carrier of ate for verification by every node. Each element of the

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5470 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

Fig. 2. Relationship between blockchain and SDIIoT. Fig. 3. Protocol communication pattern of Aardvark.

MAC authenticator is the MAC computed with the key via the Request phase and Reply phase. Replica communi-
shared by the sender (node i) and the intended recipient cation is the communication within the blockchain system.
corresponding to the element. We take the communication between a specific SDN con-
We consider an uncivil execution during which a frac- troller and the blockchain system as an example, as shown
tion gn (t) of transactions sent by SDN controller n are correct in Fig. 3, the whole consensus process mainly comprises five
and F blockchain nodes are faulty. gn (t) is the trust feature of phases: 1) Request; 2) Pre-prepare; 3) Prepare; 4) Commit;
SDN controller n. Moreover, at time slot t, SDN controller n and 5) Reply. The detailed operations are as follows.
sends bn (t) transactions to be verified. Therefore, the batch a) Request: At the beginning of time slot t, SDN
size of block (i.e., the total number of valid transactions) at controller n sends REQUEST message to the replica p it
time slot t is caculated as believes to be the primary. The message is in the form of
N
! ⟨REQUEST, ⟨TXNy ⟩σn , n⟩µn,p , where TXNy and n denote the
btot = bn (t)gn (t). (1) transaction with ID y and the SDN controller ID, respec-
n=1 tively. At time slot t, SDN controller n sends bn (t) REQUEST
In general, the BFT protocol can be classified into the messages to the primary. The local view of SDN controller
fast BFT protocol and the robust BFT protocol. Protocols, n is incorporated in transactions to be synchronized. Each
such as PBFT and Zyzzyva, are fast BFT protocols that are transaction is signed by the private key of SDN controller
designed to provide the best possible performance in the n and the REQUEST message is authenticated with a MAC
absence of faults. When malicious attacks occur, the fast BFT appropriate for verification by primary p. Upon receiving a
protocol can eventually recover from attacks, however, the REQUEST message, primary p verifies the MAC. If valid, it
transactional throughput may drop to zero during a possi- proceeds to verify the signature of the transaction. Any failure
bly long time interval corresponding to the duration of the in verification will lead to the discarding of the corresponding
attack. On the other hand, the robust BFT protocol achieves message or transaction. This implementation ensures that the
good performance when malicious attacks occur. Therefore, to primary verifies only the signature of the REQUEST message
improve the system’s robustness, we use RBFT and Aardvark, whose MAC checks out thus to filter the possible expensive
two typical robust BFT protocols, to implement the consensus verifications for malicious REQUEST messages.
process. The corresponding analysis is as follows. In this article, the control plane information of all domains
1) Aardvark: Aardvark is a specific case of the BFT is shared every time slot to assist the distributed algorithm. As
network ability. BFT is the ability of a network to unmis- a consequence, each SDN controller proposes its local view
takably reach a consensus despite malicious nodes’ attempts in the REQUEST message to the primary at every time slot.
to corrupt the process. Blockchain nodes running Aardvark Therefore, SDN controller n needs to generate bn (t) MACs
are termed as replicas where only one chosen node is also and signatures, and the primary needs to verify MACs of the
called primary. In order to improve the system’s robustness, REQUEST messages delivered by N SDN controllers and sig-
Aardvark uses a hybrid MAC-signature construct to authenti- natures of the corresponding transactions. The required CPU
cate messages. Specifically, in this article, the digital signature cycles for signing one transaction/block, verifying one signa-
is used to sign transaction and block since they will be added ture, generating one MAC, and verifying one MAC are denoted
to the system digital ledger for the possible inspection in the by δ, θ , α, and α, respectively. Hence, in this phase, the cost
future. In this way, the system is enabled to provide the prop- (unit: CPU cycle) at the primary is
erty of nonrepudiation—the ability to prove that a message is N
!
authentic to a third party. It is noteworthy that the verification bn′ (t)α + btot θ. (2)
of the digital signature is more costly than that of MAC, thus n′ =1
Aardvark also uses MAC to guard against denial-of-service
The cost at SDN controller n is
attacks where the system receives a large number of requests
with signatures that need to be verified. bn (t)(α + δ) (3)
For Aardvark, the relationship between blockchain and
SDIIoT is shown in Fig. 2. SDIIoT interacts with blockchain and there is no cost at nonprimary replica.

b) Pre-prepare: After verification, primary p orders all

valid transactions in the previous phase and forms them
as an unvalidated block. Then, primary p multicasts a
PRE-PREPARE message ⟨PRE − PREPARE, ⟨block⟩σp , p⟩µ⃗ p
containing the block to all the nonprimary replicas, where p
is the primary ID and the block is signed by the private key
of the primary. The PRE-PREPARE message is authenticated
with a MAC authenticator. Upon receipt of the PRE-PREPARE
message, nonprimary replica r checks the validity of the
appropriate portion of the MAC authenticator µ ⃗ p and the cor-
Fig. 4. Protocol communication pattern of RBFT.
responding signatures. It will enter the next phase once the
MAC and signatures check out. Primary p needs to sign the
block and generate N − 1 MACs in MAC authenticator for all ⟨REPLY, block, p, r⟩µ⃗ r containing the validated block to all
the nonprimary replicas, and each nonprimary replica verifies the SDN controllers. After receiving F + 1 matching REPLY
one MAC of the PRE-PREPARE message and one signa- messages, each SDN controller updates its global view from
ture of the block as well as signatures of transactions upon the validated block. Therefore, in this phase, each replica gen-
receipt of PRE-PREPARE message. Since the SDN controller erates N MACs for all the SDN controllers, and each SDN
is not involved from the Pre-prepare phase to the Commit controller verifies F + 1 MACs. The cost at the primary and
phase, there is no cost at the SDN controller in those phases. each nonprimary replica are both
Therefore, in this phase, the cost at the primary is
Nα (9)
δ + (N − 1)α (4)
and the cost at each SDN controller is
and the cost at each nonprimary replica is
(F + 1)α. (10)
α + (btot (t) + 1)θ. (5)
According to the above analysis, the cost for one transaction
c) Prepare: After verifying the PRE-PREPARE mes- at primary p is
sage, nonprimary replica r sends a PREPARE message "# $
⟨PREPARE, D(m), p, r⟩µ⃗ r to all the other replicas (including N
δ + btot θ + ′
n =1 bn ′ (t) + 3N + 4F− 1 α
the primary), where D(m) and r are the digest of the block ζp (t) = . (11)
btot (t)
and the replica ID, respectively. When each replica collects
2F matching PREPARE messages from distinct nonprimary The cost for one transaction at nonprimary replica r is
replicas that are consistent with the PRE-PREPARE message (1 + btot )θ + (3N + 4F)α
from the primary, it will enter the next phase. Note that the ψr (t) = . (12)
btot (t)
PRE-PREPARE message is the (2F + 1)th message in the
PREPARE quorum. Hence, in this phase, the primary verifies Since one physical machine contains the SDN controller
2F MACs, and each nonprimary replica generates N −1 MACs module and the blockchain module, we also consider the cryp-
and verifies 2F MACs. Therefore, in this phase, the cost at the tographic operations of the SDN controller. The cost for one
primary is transaction at SDN controller n is
bn (t)(α + δ) + (F + 1)α
2Fα (6) κn (t) = . (13)
btot
and the cost at each nonprimary replica is Therefore, the cost for one transaction at physical
machine n is
(N + 2F − 1)α. (7)
d) Commit: Upon receipt of 2F matching PREPARE dncrypto (t) = κn (t) + ωn (t)ζn (t) + (1 − ωn (t))ψn (t) (14)
messages from distinct replicas, replica r sends a COMMIT where ωn (t) = 1 means the primary runs on physical
message ⟨COMMIT, D(m), p, r⟩µ⃗ r to all the other replicas machine n, otherwise, ωn (t) = 0.
and enters the next phase after receiving 2F + 1 matching 2) RBFT: RBFT is another effectively robust BFT proto-
COMMIT messages from different replicas. As a consequence, col which achieves good performance when Byzantine faults
each replica generates N − 1 MACs and verifies 2F + 1 MACs occur. Each blockchain node of RBFT runs F + 1 protocol
in this phase. Therefore, in this phase, the cost at the primary instances of the BFT protocol in parallel. The procedure of
and each nonprimary replica are both RBFT is illustrated in Fig. 4. As we can see, the BFT protocol
(N + 2F)α. (8) is similar to Aardvark. The differences are the implementa-
tion of the Request phase and the additional Propagate phase
e) Reply: Upon receipt of 2F + 1 matching COMMIT between the Request phase and Pre-prepare phase. In the
messages from distinct replicas, replica r accepts the block Request phase, the transactions are sent to all the blockchain
as valid and inserts the validated block to its local copy of nodes to be verified. In the Propagate phase, each node sends
the system digital ledger, then it sends a REPLY message a PROPAGATE message to all the other nodes. After receiving

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5472 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

F +1 matching PROPAGATE messages, each blockchain node by the tuple (dn,lnormal (t), T max (t)), where d normal (t) specifies
n,l n,l
forwards all the transactions to replicas of the F + 1 protocol the required number of CPU cycles to complete the task, and
instances running locally and enters the next phases which are max (t) is the maximum delay (unit: s). Let ξ normal (t) denote
Tn,l n,l
the same as that of Aardvark. RBFT adopts the same cryp- the volume of computational resources allocated to the lth
tographic technique as Aardvark. Similarly, we can calculate noncryptographic task in domain n at time slot t, thus the
the cost for one transaction per protocol instance at primary n execution time of task (dn,l normal (t), T max (t)) can be caculated as
n,l
as follows:
normal (t)
dn,l
(btot + 2N + 4F − 2)α (N − 2)α + θ Tn,l (t) = (20)
ζn (t) = + normal (t)
ξn,l
b (t) F+1
"# tot $
N
n′ =1 bn′ (t) + N α where Tn,l (t) should satisfy the following constraint:
+ . (15)
btot (t)(F + 1) max
Tn,l (t) ≤ Tn,l (t). (21)
The cost for one transaction per protocol instance at non-
primary replica n is Assume one CPU cycle is comprised of ρ clock cycle,
normal (t) is the frequency corresponding to the com-
thus ρξn,l
(btot + 2N + 4F)α (N − 2)α + θ putational resources. The power consumption of CPU is
ψn (t) = + normal (t), where e is the operating volt-
"#
btot (t)
$
F+1 proportional to e2 ρξn,l
N age. Furthermore, the operating voltage is proportional to the
n′ =1 bn′ (t) + N α
+ . (16) frequency of CPU, hence we have the following power model
btot (t)(F + 1) for the lth noncryptographic task in domain n at time slot t:
The cost for one transaction at SDN controller n is " $3
bn (t)δ + (bn (t)N + F + 1)α q + ηρ 3 ξn,l normal
(t) (22)
κn (t) = . (17)
btot (t) where q is a fixed value representing the power consump-
Therefore, the cost for one transaction at physical tion of other components (except CPU) which is independent
machine n is of the frequency. η is a proportional factor. Thus, the energy
F+1 F+1 normal (t), T (t)) is
consumption for executing the task (dn,l
! ! % & n,l
dncrypto (t) = κn (t) + ωf ,n (t)ζn (t) + 1 − ωf ,n (t) ψn (t) '
normal (t)
dn,l " $3 (
f =1 f =1
Ei = normal (t)
q + ρ 3 η ξn,l
normal
(t) . (23)
(18) ξn,l
where ωf ,n (t) = 1 means the primary replica of protocol It is worth noting that the noncryptographic task may be
instance f runs on blockchain node n, otherwise, ωf ,n (t) = 0. the task offloaded by the industrial device from the infras-
Suppose the blockchain system is saturated by REQUEST tructure plane. However, this article mainly considers the
messages. According to the above analysis, the transactional computational resource allocation regarding MEC to optimize
throughput (unit: transactions per second, TPS) of blockchain the energy efficiency in the proposed scenario, therefore, the
is caculated as energy consumption for the corresponding data transmission
crypto
ξn (t) is not taken into account in the system model. Consequently,
T PS(t) = min crypto (19) we can calculate the total energy consumption of N MECs as
n∈N dn (t) − κn (t) follows:
where ξn
crypto
(t) represents the computational resources for N
) crypto
! dn (t)btot (t) " % &3 $
cryptographic operations in the MEC containing blockchain E(t) = crypto q + ρ 3 η ξncrypto (t)
node n at time slot t. Besides, since SDN controller is not n=1 ξn (t)
normal (t) '
*
the part of blockchain, κn (t) is deducted in the denomi- !L
dn,l " $3 (
crypto 3 normal
nator. The calculation of dn (t) depends on the specific + q + ρ η ξn,l (t) .
implementation of the consensus protocol. ξ normal (t)
l=1 n,l
(24)
D. Energy-Efficiency Model
Aming to utilize energy efficiently, we consider the system
Aiming to improve the throughput of blockchain, each SDN energy efficiency calculated as
controller utilize its local MEC to execute the computational
tasks pertaining to the blockchain system. In the meantime, it T PS(t)
G(t) = . (25)
is noteworthy that the cryptographic operation is not the only E(t)
kind of computational task to use computational resources in
MEC. Some other noncryptographic operations occupy part of IV. P ROBLEM F ORMULATION
the computational resources as well. Among all the noncrypto- According to (1), the trust feature of the SDN controller has
graphic tasks in a domain, assume that L of them are selected an influence on the batch size of block thus to affect the energy
to be executed at one time slot. The lth (l ∈ {1, 2, . . . , L}) non- efficiency, therefore, we consider the trust feature as one of the
cryptographic task in domain n at time slot t is characterized state in our problem. We utilize a Markov chain to describe

the variation of trust feature gn (t). Hence, the correspond- to the trust feature of the SDN controller. As a consequence,
ing transition probability is denoted by Pr(gn (t + 1)|gn (t)). these transactions will be discarded and the corresponding
In addition, we assume that the variation of dn,l normal (t) also information will not be shared through the consensus process.
normal
has the Markov property, and ξn,l (t) can affect the value In this case, earlier state information can be used as an alter-
normal (t). Furthermore, computational resources of each
of dn,l native. Furthermore, the SDN controller can be considered as
MEC are limited by ξtot , thus we have the following constraint: the central authority of its own domain, thus it should not be
as fragile as a general smart device. On this basis, we assume
L
! that the SDN controller is well designed in safety that only
ξncrypto (t) + normal
ξn,l (t) ≤ ξtot ∀n. (26) a portion of transactions is incorrect in the worst case (i.e.,
l=1
g(t) ∈ (0, 1]) and the correctness of transactions pertaining to
Therefore, the corresponding transition probability is the state information can be guaranteed.
denoted by Pr(dn,lnormal (t+1)|d normal (t), ξ normal (t), ξ crypto (t)). In
n,l n,l n
order to improve the system energy efficiency, we consider the B. Action
trust features of SDN controllers and the resource requirements
of noncryptographic tasks as the states of our problem and The agent needs to decide the batch size of block btot (t),
make sequential decisions according to their Markov property. computational resource allocation ξ crypto (t) for cryptographic
To do this, the component called agent is utilized to interact operations, and computational resource allocation ξ normal (t)
with the environment. Moreover, the agent is equipped with for noncryptographic operations. Therefore, the action at for
the DRL algorithm to handle the problem. Therefore, at each agent n at time slot t is denoted by
time slot t, an agent observes state st and chooses action at ⎡ ⎤
b1 (t) b2 (t) ··· bN (t)
which determines the immediate reward rt and next state st+1 . ⎣ ξ crypto (t) crypto
ξ2 (t) · · · ξN
crypto
(t) ⎦ (29)
1
In our case, the agent is located within MEC, thus each MEC ξ1normal normal
(t) ξ 2 (t) · · · ξ Nnormal
(t)
contains an agent. However, it is noteworthy that agent n can
only collect the up-to-date states regarding physical machine where ξnnormal (t) is a vector with L elements representing the
n at the beginning of each time slot. In order to make deci- computational resources allocated to noncryptographic tasks
sions that need the whole states of the system, we model our in domain n at time slot t.
problem as a POMDP.
C. Transition Probability
A. Observation According to the above discussion, the transition proba-
The state st at time slot t is bility that action at in state st at time slot t leads to state
+ , st+1 at time slot t + 1 can be denoted by Pr(st+1 |st , at ). The
g1 (t) g2 (t) ··· gN (t)
(27) goal of the agent is to choose actions at each time slot that
d1normal (t) d2normal (t) ··· normal (t)
dN
maximize its long-term expected reward determined by the
where dnnormal (t) is a vector with L elements representing the corresponding actions and states. Furthermore, the variation of
resource requirements of noncryptographic tasks in domain n states depends on Pr(st+1 |st , at ), therefore, for MDP solutions
at time slot t. with fully observable states, the variation of states is utilized
For a specific agent n at time slot t, it can get the cur- to implicitly estimate Pr(st+1 |st , at ) thus to get the optimal
rent states of its own domain, whereas dm̸ normal (t) and g actions. However, for the POMDP issue in this article, the
=n m̸=n (t)
are beyond the reach. Hence, the system state is partially states are partially observable and the next state is determined
observable which is the case of POMDP. We denote the by Pr(st+1 |st , at )Pr(st |ot ). Hence, using the observation alone
obervations of domain n as the corresponding states of the will not work correctly. We will propose our solution to handle
current time slot. In addition, agent n can get previous states this issue in Section V.
of other domains through the consensus process as the state
information is the control plane information included in the D. Reward
local view, thus observations of other domains are represented
In order to improve the long-term expected system energy
by the corresponding states of the last time slot. Although
efficiency, G(t) should be included in the immediate reward.
these observations are not the up-to-date states which we really
Additionally, constraints (21) and (26) should be satisfied. We
need, with the increase of the training time, the agent will get
utilize the ReLu function f (x) = max(x, 0) to calculate the
more information about the model and work more and more
following variables with regard to the constraints:
similar to the condition where all current states are known.
) *
Therefore, the observation ot for agent n of time slot t is L
!
crypto normal
+ , ℵn (t) = max ξn (t)+ ξn,l (t) − ξtot , 0 ∀n (30)
g1 (t − 1) ··· gn (t) ··· gN (t − 1) l=1
d1normal (t − 1) · · · dnnormal (t) · · · dN normal (t − 1) . % &
max
βn,l (t) = max Tn,l (t) − Tn,l (t), 0 ∀n, l. (31)
(28)
Then, we form the penalty function P(t) as follows:
It is worth noting that transactions regarding the above state
information may not pass the corresponding verification due P(t) = ∥ℵ(t)∥1 + ∥β(t)∥1 (32)

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5474 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

where ℵ(t) and β(t) are the corresponding vectors of ℵn (t) B. Deep Q Network
and βn,l (t). ∥ · ∥1 represents the operation of the L1 norm. For the problems feature either high-dimensional state space
Therefore, we calculate the immediate reward as follows: S or high-dimensional action space A which is often the case
rt = G(t) − λP(t) (33) in reality, it is impractical to maintain a separate estimate for
each S × A. As an alternative, a model (also known as approx-
where λ (λ > 0) is the penalty coefficient. P(t) is a measure of imator) is used to approximate Q-values, usually a nonlinear
violation of the constraints. It is nonzero when the constraints approximator. In the case of DQN, the approximator is NN
are violated and is zero in the region where constraints are sat- parameterized by weights and biases collectively denoted by ν.
isfied. By introducing P(t) to the immediate reward, the agent Consequently, Q-values are expressed as Q(s, a|ν). Due to the
will gradually learn to satisfy the constraints while optimizing adoption of NN, DQN has the ability to handle large-scale
the system energy efficiency. problems which largely broadens the application of RL [31].
It is noteworthy that each agent makes decisions about the Instead of updating individual Q-values, updates are now made
system, but only the actions for its own domain are actually to the parameters of NN to minimize a differentiable loss
executed. In other words, each agent actually makes deci- function
sions for its own domain based on the global states. Moreover, ' (2
% &
different domains’ contributions to the immediate reward are L(νt ) = rt + γ max Q st+1 , a′ |νt − Q(st , at |νt ) (36)
shared through the consensus process, thus all agents can get a′
the same immediate reward which pushes them to the same νt+1 = νt − α∇ν L(νt ). (37)
direction of optimization. As a consequence, the distributed In (36), it is notable that Q(st , at |νt ) represents the current
agents will work cooperatively like a single entity when the estimate for the following cumulative reward from state st after
DRL algorithm converges. executing action at and following the optimal policy thereafter:
∞
!
V. DRL-BASED O PTIMIZATION F RAMEWORK Rt = rt + γ τ rt+τ . (38)
Several methods can approximate the optimal POMDP τ =1
policy on condition that the model of the environment is DQN assumes that the action corresponding to the maxi-
known, which may not be an appropriate assumption in real- mal Q-value will be executed at the next state st+1 , thus the
ity [36]. To be more practical, we utilize DRL to solve the following expression can be considered as an alternative to
POMDP problem without knowing the explicit model of the represent the above cumulative reward:
environment. % &
R′t = rt + γ max Q st+1 , a′ |νt . (39)
a′
A. Q-Learning
Due to the use of current Q network, R′t is also the esti-
Reinforcement learning (RL) is a branch of machine learn- mate for Rt , however, compared to Q(st , at |νt ), R′t is closer to
ing which predominantly seeks to solve the MDP where its the target Rt since it includes the actual immediate reward rt ,
system dynamics Pr(st+1 |st , at ) are unknown. The goal of the therefore, DQN utilizes the mean-square error (MSE) between
agent is to select action which maximizes long-term rewards. R′t and Q(st , at |νt ) depicted in (36) as the loss function and
Q-learning is one of the traditional RL algorithms and adopts (37) to update the Q network.
a model-free learning method for estimating the long-term Since the same NN generates the next state target Q-values
expected reward known as Q-value. In addition, Q-value is Q(st+1 , a′ |νt ) used in updating the current Q-values, the
also called the action–state value function. A specific Q-value learning process with such update can be unstable or even
Qπ (st , at ) corresponding to the policy π is defined as the long- diverge [37]. DQN ingeniously utilizes two mechanisms to
term expected reward from state st after executing action at restore learning stability. First, transitions like (st , at , rt , st+1 )
and following the policy π thereafter: are recorded in a experience replay memory and then sam-
1∞ 2
! pled uniformly at the training time. Second, a separate target
Qπ (st , at ) = Eπ γ τ rt+τ |st , at (34) network Q̂ generates target Q-values for update, decoupling
τ =0 the feedback resulting from the network generating its own
where γ (0 ≤ γ ≤ 1) is the discount factor representing the targets. Q̂ is identical to the evaluation network Q except its
difference on importance between future rewards and imme- parameters ν − are updated less frequently than the evaluation
diate reward. Q-values are learned iteratively by updating the network to match ν. As a result, the loss function in (36) are
current Q-value estimate as follows: changed into
Q(st , at ) ← Q(st , at ) L(νt ) = (yt − Q(st , at |νt ))2 (40)
' (
% &
′
+ α rt + γ max Q st+1 , a − Q(st , at ) (35) where yt = rt + γ maxa′ Q̂(st+1 , a′ |νt− )
is the stable update
a′ target given by the target network Q̂. Learning processes with
where α ≥ 0 is the learning rate. Generally, Q-learning uses such update have been empirically shown to be tractable and
Q-table to store Q(s, a). At a certain time slot, Q-value regard- stable [31].
ing the state and action of that time slot is updated according Despite the wide application in many fields, DQN can-
to (35). not be directly used to handle our problem due to its partial

Fig. 6. Structure of NN in NAF.

Fig. 5. Structure of a node A in RNN and its equivalent paradigm.

good or bad decision given a certain state st . In other words,

observability and continuous action space. To overcome this
it is the advantage of selecting a certain action from a certain
deficiency, we use DRQN and NAF to cope with the above
state. Since calculating Q-values over continuous action space
two aspects, respectively.
is almost impossible, NAF instead seeks to directly output the
action and the corresponding Q-value which is the maximum
C. Deep Recurrent Q Network for the input state. Specifically, Q-value is decomposed into
In order to calculate Q-values and choose action in a certain the advantage function and the state value function, and the
time slot, it is a prerequisite to get the system state which is advantage function is parameterized as a quadratic function of
partially observable according to Section IV-A. Vanilla DQN nonlinear features of the state
has no explicit mechanism to decipher the underlying state in
POMDP and estimating the Q-value from the observation can Q(s, a|ν) = A(s, a|ν) + V(s|ν) (43)
be arbitrarily bad since Q(o, a|ν) ̸= Q(s, a|ν). Hausknecht and 1
Stone [32] showed that adding recurrency to DQN allows the A(s, a|ν) = − (a − a(s|ν))T P(s|ν)(a − a(s|ν)). (44)
2
algorithm to better estimate the underlying system state and
narrowing the gap between Q(o, a|ν) and Q(s, a|ν). In other P(s|ν) is a positive-definite square matrix which is calcu-
words, DRQN utilizing recurrent NN (RNN) can approximate lated by P(s|ν) = L(s|ν)L(s|ν)T , where L(s|ν) is a lower
actual Q-values from observations. triangular matrix whose entries come from a linear output
RNN is a kind of NN that can remember things learned from layer of NN, with the diagonal terms exponentiated. The struc-
previous inputs while generating outputs. The structure of a ture of NN in NAF is illustrated in Fig. 6, where the outputs
specific node A in RNN and its equivalent paradigm are shown are formed by three terms: 1) V(s|ν) is the estimate of state
in Fig. 5, where the output z is affected not just by the input x value function; 2) a(s|ν) is a vector pertaining to action; and
like regular NN, but also by the “hidden” state vector h repre- 3) L0 (s|ν) is a vector to construct the lower triangular matrix
senting the information based on previous input(s)/output(s). L(s|ν). Since Q-value is quadratic in a, the action correspond-
In this way, previous states can be stacked together with cur- ing to the maximum Q-value for state s is always given by
rent observations to implicitly decipher the current state thus a(s|ν). In this way, NAF utilizes NN to directly choose action
to output correct Q(s, a|ν). and get the corresponding Q-value for update.
According to the above discussion, we combine DRQN with
D. Normalized Advantage Functions NAF to tackle partially observable state and continuous action
In DQN, the agent utilizes NN to output Q-values overall space in our problem. Hence, the state s in Fig. 6 is replaced by
actions on the basis of the input state and chooses action with the observation o and the NN is actually RNN. We illustrate
the maximum Q-value thereafter. Therefore, DQN is fit for the detailed structure of our proposed DRQN-NAF scheme
handling discrete control but fails in the problem with con- for MEC 1 at time slot t in Fig. 7. As we have described
tinuous action space which is the case in this article. As a in Section III, blockchain modules of all physical machines
workaround, NAF [33] is utilized to make some emendations construct the blockchain network. The SDN controller mod-
to the output of the NN to fit the situation in this article. ule in a physical machine communicates with the blockchain
To describe NAF, we first give the definitions of the state network via REQUEST and REPLY messages. The agent
value function Vπ (st ) and advantage function Aπ (st , at ) as interacts with the environment through the SDN controller.
follows: At the beginning of a time slot, the agent chooses action via
1∞ 2 the evaluation network on the basis of the current observation
! from the SDN controller. According to the action, the SDN
τ
Vπ (st ) = Eπ γ rt+τ |st (41)
controller assigns computational resources to cryptographic
τ =0
Aπ (st , at ) = Qπ (st , at ) − Vπ (st ) (42) and noncryptographic tasks and sends a certain number of
REQUEST messages to the blockchain network. After the con-
where Vπ (st ) is the long-term expected reward received by sensus procedure, the SDN controller calculates the immediate
the agent that starts from state st and follows the policy π . reward based on the control plane information received from
Aπ (st , at ) is a measure of how much is a certain action at the REPLY messages and stores transition (ot , at , rt , ot+1 ) to

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5476 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

Algorithm 1 Procedure of DRQN-NAF for Each Agent

Initialize evaluation network Q with parameter set ν
Initialize target network Q̂ with parameter set ν − ← ν
Initialize experience play memory D ← ∅
Initializecounters t ← 0, T ← 0
repeat
Initialize a random process N for action exploration
Get observation ot , tstart = t
repeat
Select action at = a(ot |νt ) + Nt
Execute at and receive reward rt and new observation ot+1
Store transition (ot , at , rt , ot+1 ) in D
for iteration k = 1 : K do
Sample a random mini-batch of M transitions from D
Set ym = rm + γ V̂(om+1 |νt− )
Set L(νt ) = M1 #M %y − Q(o , a |ν )&2
m=1 m m m t
Update the evaluation network using mini-batch gradi-
ent descent νt ← νt − α∇L(νt )
end for
Each C time slots update the target network νt− ← νt
t ← t + 1, T ← T + 1
until t == tstart + T̂
until T > Tmax

TABLE IV
S IMULATION PARAMETERS

Fig. 7. Detailed structure of the proposed DRQN-NAF scheme in MEC 1

at time slot t.
The state transition probability consists of two parts:
normal (t
Pr(dn,l + 1)|dn,l normal (t), ξ normal (t), ξ crypto (t)) and
n,l n
the experience replay D. Meanwhile, the agent randomly sam- normal
Pr(gn (t + 1)|gn (t)). In the simulation, we limit dn,l in the
ples a mini-batch of transitions from D and performs the
range from 2×106 to 1×107 with the interval 2×106 , and limit
corresponding training operations. normal and ξ crypto in the range from 1×108 to 2×109 with the
ξn,l n
Let T̂ and Tmax denote the number of time slots in one
episode and the maximum training episodes, respectively. In interval 1×108 . Therefore, the dimension of the state transition
matrix for Pr(dn,l normal (t + 1)|d normal (t), ξ normal (t), ξ crypto (t)) is
order to encourage exploration, we add noise to the output n,l n,l n
a(s|ν). Moreover, we update the evaluation network K times 5 × 5 × 20 × 20. Due to space limitation, we only give an
per time slot with the experience replay memory to acceler- example on the state transition matrix for the specific values
normal (t) and ξ crypto (t). The corresponding matrix is as
of ξn,l
ate learning. Therefore, for each agent, the procedure of the n
integrated DRQN-NAF is summarized in Algorithm 1. follows:
⎛ ⎞
0.313 0.242 0.12 0.126 0.199
VI. S IMULATION R ESULTS AND D ISCUSSION ⎜0.134 0.218 0.326 0.235 0.087⎟
⎜ ⎟
⎜0.229 0.208 0.244 0.212 0.107⎟. (45)
In this section, the simulation results are presented to ⎜ ⎟
⎝0.116 0.217 0.221 0.301 0.145⎠
demonstrate the effectiveness of the proposed DRL-based
0.252 0.101 0.195 0.231 0.221
scheme for Aardvark and RBFT. Besides, for comparison
purpose, we have also simulated PBFT, a typical fast BFT On the other hand, we limit gn in the range from 0.1 to 1
protocol. We use PyTorch, an open-source machine learn- with the interval 0.1, thus the dimension of the state transition
ing library for Python, to implement DRQN-NAF. The main matrix for Pr(gn (t + 1)|gn (t)) is 10 × 10. The corresponding
parameters used in the simulation are summarized in Table IV. matrix is shown in (46), at the bottom of the next page.
In each MEC, we assume three kinds of noncryptographic For the case of ten physical machines, Fig. 8 illustrates the
tasks with maximum 1-s, 100-ms, and 10-ms delays and select relationship between energy efficiency and training episodes
three tasks with different maximum delays for each time slot. for different consensus protocols solved by centralized DRL

Fig. 8. Training curves tracking energy efficiency of PBFT, Aardvark, and

Fig. 10. Convergence speed comparison of PBFT, Aardvark, and RBFT
RBFT under different DRL implementations.
under different mini-batch sizes.

observable and the action space is continuous. As we can see,

our proposed DRL achieves similar performance to that of the
centralized DRL with about 10% reduction in the performance.
Due to the distributed feature and the adoption of RNN which
is more complex than general NN, our proposed DRL con-
verges more slowly than centralized DRL. Besides, due to
the simple cryptographic technique in the consensus process,
PBFT achieves the highest energy efficiency among the three
protocols, however, it also achieves the weakest robustness. As
for the robust BFT protocols, compared to RBFT, Aardvark
utilizes energy in a more efficient way. However, on the other
hand, Aardvark achieves weaker robustness due to the design
of its structure.
As shown in Fig. 9, for the same case (ten physical
machines), we also have conducted experiments aiming to
Fig. 9. Convergence speed comparison of PBFT, Aardvark, and RBFT under
different learning rates. illustrate the convergence speed of the three consensus pro-
tocols under different orders of magnitude for the learning
(i.e., centralized NAF) and proposed DRL (i.e., distributed rate. As a matter of fact, among all hyperparameters in DRL,
DRQN-NAF), respectively. Centralized DRL deploys a cen- the learning rate is one of the key factors to affect conver-
tralized agent to learn the policy and works under the assump- gence performance. The lower the value, the slower the agent
tion that the agent can collect the current states of all SDN travel along the downward slope, resulting in slower conver-
controllers which is the same as the assumption in the existing gence speed. Nevertheless, blindly increasing the learning rate
literature. However, as we have mentioned before, this assump- will make the learning process divergent. As we can see,
tion is opposed to the decentralization principle of blockchain PBFT, Aardvark, and RBFT converge fastest at the orders
which should not be used in reality. Hence, the utilization of magnitude of 0.0001, 0.0001, and 0.00001, respectively.
of centralized DRL can be considered as an ideal situation. Furthermore, Fig. 10 compares the convergence speed of the
The specific algorithm is NAF since the system state is fully three consensus protocols under different mini-batch sizes M.

⎛ ⎞
0.2 0.14 0.11 0.1 0.1 0.04 0.06 0.02 0.08 0.15
⎜ 0.16 0.18 0.14 0.13 0.08 0.03 0.04 0.03 0.1 0.11 ⎟
⎜ ⎟
⎜ 0.11 0.16 0.16 0.14 0.13 0.1 0.06 0.039 0.031 0.07 ⎟
⎜ ⎟
⎜ 0.1 0.11 0.16 0.2 0.11 0.12 0.05 0.08 0.026 0.044⎟
⎜ ⎟
⎜ 0.06 0.07 0.095 0.145 0.24 0.14 0.09 0.07 0.04 0.05 ⎟
⎜ ⎟ (46)
⎜ 0.08 0.032 0.062 0.14 0.12 0.19 0.156 0.1 0.1 0.02 ⎟
⎜ ⎟
⎜ 0.02 0.05 0.06 0.1 0.13 0.14 0.16 0.16 0.11 0.07 ⎟
⎜ ⎟
⎜0.018 0.03 0.044 0.049 0.095 0.146 0.29 0.137 0.12 0.071⎟
⎜ ⎟
⎝0.027 0.03 0.035 0.037 0.101 0.137 0.31 0.148 0.114 0.061⎠
0.025 0.08 0.031 0.026 0.112 0.143 0.262 0.146 0.121 0.054

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5478 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

Fig. 13. Energy efficiency of PBFT, Aardvark, and RBFT versus the total
Fig. 11. Total computational resources of MEC in Aardvark versus time slot. number of transactions sent by N SDN controllers.

Fig. 14. Energy efficiency of PBFT, Aardvark, and RBFT versus computa-
Fig. 12. Delay of the noncryptographic task in Aardvark versus time slot. tional resources for cryptographic operations in each MEC.

As another important hyperparameter, mini-batch size controls As we can see, with the convergence of the learning algo-
the accuracy of the estimate of the loss function gradient when rithm, the actual delay of this kind of task is controlled to
performing DRL. Higher mini-batch size can bring about more be lower than the maximum day. Moreover, the violation of
accurate gradient descent thus to stabilize and accelerate the constraint (21) results in a relatively wider area of the explor-
learning process. Specifically, for PBFT, Aardvark, and RBFT ing space of delay, hence the agent takes more time slots to
in this article, the best mini-batch sizes are 64, 64, and 128, satisfy the delay constraint. As for other MECs and protocols,
respectively. Higher values cause nothing in the improvement the performance curves are similar to Figs. 11, 12. Due to
for convergence speed but the need for more memory space. the space limitation, we only give the above two figures as a
To show the influence the penalty function has on the typical example.
computational resources, we take Aardvark with ten physi- Aiming to show the effect of computational resources for
cal machines (i.e., ten MECs) as the example and depict the cryptographic operations and the number of transactions sent
relationship between total computational resources of a spe- by SDN controllers, we set N = 10, then we fix b =
crypto crypto crypto
cific MEC and time slots in Fig. 11. In the beginning, the [b1 , b2 , . . . , bN ] and ξ crypto = [ξ1 , ξ2 , . . . , ξN ],
value of total computational resources varies in a wide range respectively, and compare the corresponding performance for
to explore the possibility of action space. The agent learns our proposed DRL in Figs. 13 and 14. As is shown, different
from the penalty [nonzero P(t)] it receives from the reward areas of b and ξ crypto result in distinct varying trends of energy
and tries to avoid the area of action space leading to the vio- efficiency. Hence, one-size-fits-all parameter setting does not
lation of constraint (26). Besides, the agent learns to achieve exist. Besides, due to tracking the variation of the environment,
higher G(t) as well. As the learning algorithm progresses, the our proposed DRL achieves higher energy efficiency than the
value of total computational resources varies in a small range two fixed schemes, and the fixed ξ crypto scheme gets severer
to achieve the best reward, resulting in the convergence of the performance degradation.
algorithm and the satisfaction of the corresponding constraint. As the general cases of performance display for Fig. 8,
In addition, Fig. 12 illustrates the delay performance pertaining Figs. 15–17 illustrate the relationship between energy
to the noncryptographic task with a maximum 10 ms delay. efficiency, and the number of blockchain nodes for PBFT,

and centralized DRL, we also fix b and ξ crypto , respectively,

and show the corresponding performance. The simulation
shows that our proposed DRL achieves better performance
than the two fixed schemes, therefore, the necessity for action
selection of our proposed scheme is proved.

VII. C ONCLUSION
In this article, we integrated permissioned blockchain into
distributed SDIIoT to reach the consensus of the global view
and proposed an optimization framework to optimize the
system energy efficiency. In order to fulfill the decentralization
feature of blockchain, we described our problem as POMDP
which can be implemented in a truly distributed manner. Then,
Fig. 15. Energy efficiency of PBFT versus the number of blockchain nodes. we proposed a novel DRL approach to solve the POMDP.
The simulation results showed that the proposed algorithm
could achieve the goal of improving the energy efficiency of
Aardvark, RBFT and PBFT with limited performance reduc-
tion compared to the ideal situation. For future work, we will
consider more consensus protocols and apply similar schemes
in specific edge computing cases such as video transcoding.

R EFERENCES
[1] J. Li, F. R. Yu, G. Deng, C. Luo, Z. Ming, and Q. Yan,
“Industrial Internet: A survey on the enabling technologies, applica-
tions, and challenges,” IEEE Commun. Surveys Tuts., vol. 19, no. 3,
pp. 1504–1526, 3rd Quart., 2017.
[2] X. Li, D. Li, J. Wan, C. Liu, and M. Imran, “Adaptive transmission
optimization in SDN-based industrial Internet of Things with edge
computing,” IEEE Internet Things J., vol. 5, no. 3, pp. 1351–1360,
Jun. 2018.
Fig. 16. Energy efficiency of Aardvark versus the number of blockchain [3] D. Ongaro and J. Ousterhout, “In search of an understandable consensus
nodes. algorithm,” in Proc. USENIX Annu. Tech. Conf., Philadelphia, PA, USA,
2014, pp. 305–320.
[4] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Rep.,
2008. [Online]. Available: https://bitcoin.org/bitcoin.pdf
[5] Z. Zheng, S. Xie, H. Dai, X. Chen, and H. Wang, “An overview of
blockchain technology: Architecture, consensus, and future trends,” in
Proc. IEEE Int. Congr. Big Data (BigData Congress), Honolulu, HI,
USA, Jun. 2017, pp. 557–564.
[6] F. R. Yu, J. Liu, Y. He, P. Si, and Y. Zhang, “Virtualization for distributed
ledger technology (vDLT),” IEEE Access, vol. 6, pp. 25019–25028,
2018.
[7] C. Wang, Y. He, F. R. Yu, Q. Chen, and L. Tang, “Integration of
networking, caching, and computing in wireless systems: A survey, some
research issues, and challenges,” IEEE Commun. Surveys Tuts., vol. 20,
no. 1, pp. 7–38, 1st Quart., 2018.
[8] S. Song and J. Chung, “Sliced NFV service chaining in mobile edge
clouds,” in Proc. 19th Asia–Pac. Netw. Oper. Manag. Symp. (APNOMS),
Seoul, South Korea, Sep. 2017, pp. 292–294.
[9] H. Truong and M. Karan, “Analytics of performance and data quality for
mobile edge cloud applications,” in Proc. IEEE 11th Int. Conf. Cloud
Comput. (CLOUD), San Francisco, CA, USA, Jul. 2018, pp. 660–667.
Fig. 17. Energy efficiency of RBFT versus the number of blockchain nodes. [10] H. Liu, F. Eldarrat, H. Alqahtani, A. Reznik, X. de Foy, and Y. Zhang,
“Mobile edge cloud system: Architectures, challenges, and approaches,”
IEEE Syst. J., vol. 12, no. 3, pp. 2495–2508, Sep. 2018.
Aardvark, and RBFT, respectively. According to the corre- [11] J. Luo, F. R. Yu, Q. Chen, and L. Tan, “Adaptive video stream-
sponding consensus procedures, more blockchain nodes will ing with edge caching and video transcoding over software-defined
beget more message exchanges, thus more computational mobile networks: A deep reinforcement learning approach,” IEEE Trans.
Wireless Commun., early access, doi: 10.1109/TWC.2019.2955129.
resources will be used for the same amount of transactions.
[12] J. Wan et al., “Toward dynamic resources management for IoT-based
Therefore, the energy efficiency decreases with the increase in manufacturing,” IEEE Commun. Mag., vol. 56, no. 2, pp. 52–59,
the number of blockchain nodes. As shown in the above fig- Feb. 2018.
ures, our proposed DRL achieves similar performance to the [13] M. Moness, A. M. Moustafa, A. H. Muhammad, and A. A. Younis,
“Hybrid controller for a software-defined architecture of industrial
ideal situation (i.e., centralized DRL) with the variation of the Internet lab-scale process,” in Proc. 12th Int. Conf. Comput. Eng. Syst.
number of blockchain nodes. In addition to the proposed DRL (ICCES), Cairo, Egypt, Dec. 2017, pp. 266–271.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.
5480 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 6, JUNE 2020

[14] R. Chaudhary, G. S. Aujla, S. Garg, N. Kumar, and J. J. P. C. Rodrigues, [38] D. Bernstein, “The Poly1305-AES message-authentication code,” in
“SDN-enabled multi-attribute-based secure communication for smart Proc. 12th Int. Conf. Fast Softw. Encryption, Paris, France, Feb. 2005,
grid in IIoT environment,” IEEE Trans. Ind. Informat., vol. 14, no. 6, pp. 32–49.
pp. 2629–2640, Jun. 2018. [39] D. J. Bernstein, N. Duif, T. Lange, P. Schwabe, and B.-Y. Yang, “High-
[15] I. Bedhief, L. Foschini, P. Bellavista, M. Kassar, and T. Aguili, “Toward speed high-security signatures,” J. Cryptograph. Eng., vol. 2, no. 2,
self-adaptive software defined fog networking architecture for IIoT and pp. 77–89, Sep. 2012.
industry 4.0,” in Proc. IEEE 24th Int. Workshop Comput. Aided Model.
Design Commun. Links Netw. (CAMAD), Limassol, Cyprus, Sep. 2019,
pp. 1–5. Jia Luo received the M.S. degree from Chongqing
[16] F. Bannour, S. Souihi, and A. Mellouk, “Distributed SDN control: University of Posts and Telecommunications,
Survey, taxonomy, and challenges,” IEEE Commun. Surveys Tuts., Chongqing, China, in 2014, where he is currently
vol. 20, no. 1, pp. 333–354, 1st Quart., 2018. pursuing the Ph.D. degree with the School of
[17] A. Tootoonchian and Y. Ganjali, “HyperFlow: A distributed control Communication and Information Engineering.
plane for OpenFlow,” in Proc. Internet Netw. Manag. Conf. Res. From April 2018 to April 2019, he was a visiting
Enterprise Netw., Apr. 2010, pp. 1–6. Ph.D. student with Carleton University, Ottawa,
[18] Koponen et al., “Onix: A distributed control platform for large-scale ON, Canada. His current research interests include
production networks,” in Proc. USENIX Symp. Operating Syst. Design blockchain, SDN, mobile edge computing, and
Implement., Vancouver, BC, Canada, Oct. 2010, pp. 351–364. deep reinforcement learning.
[19] P. Berde et al., “ONOS: Towards an open, distributed SDN OS,” in Proc.
ACM SIGCOMM Workshop Hot Topics Softw. Defined Netw., Aug. 2014,
pp. 1–6. Qianbin Chen (Senior Member, IEEE) received
[20] P. Hunt et al., “Zookeeper: Wait-free coordination for Internet-scale the Ph.D. degree in communication and information
systems,” in Proc. USENIX Annu. Tech. Conf., 2010, p. 11. system from the University of Electronic Science
[21] S. Yeganeh and Y. Ganjali, “Kandoo: A framework for efficient and Technology of China, Chengdu, China, in
and scalable offloading of control applications,” in Proc. 1st ACM 2002.
Int. Workshop Hot Topics Softw. Defined Netw., Helsinki, Finland, He is currently a Professor with the
Aug. 2012, pp. 19–24. School of Communication and Information
[22] S. Jain et al., “B4: Experience with a globally-deployed software defined Engineering, Chongqing University of Posts and
WAN,” in Proc. ACM SIGCOMM Conf., Hong Kong, China, Aug. 2013, Telecommunications, Chongqing, China, where he
pp. 3–14. is the Director of the Chongqing Key Laboratory
[23] A. Kumar et al., “BwE: Flexible, hierarchical bandwidth allocation for of Mobile Communication Technology. He has
WAN distributed computing,” in Proc. ACM SIGCOMM Special Interest authored or coauthored over 100 papers in journals and peer-reviewed con-
Group Data Commun., London, U.K., Aug. 2015, pp. 1–14. ference proceedings, and has coauthored seven books. He holds 47 granted
[24] C. Qiu, F. R. Yu, H. Yao, C. Jiang, F. Xu, and C. Zhao, national patents.
“Blockchain-based software-defined industrial Internet of Things: A
dueling deep Q-learning approach,” IEEE Internet Things J., vol. 6,
no. 3, pp. 4627–4639, Jun. 2019. F. Richard Yu (Fellow, IEEE) received the Ph.D.
[25] M. Liu, F. R. Yu, Y. Teng, V. Leung, and M. Song, “Performance degree in electrical engineering from the University
optimization for blockchain-enabled Industrial Internet of Things (IIoT) of British Columbia, Vancouver, BC, Canada, in
systems: A deep reinforcement learning approach,” IEEE Trans. Ind. 2003.
Informat., vol. 15, no. 6, pp. 3559–3570, Jun. 2019. From 2002 to 2006, he was with Ericsson, Lund,
[26] G. Wood, “Ethereum: A secure decentralised generalised transaction Sweden, and a startup in California, USA. He joined
ledger,” Ethereum Project Yellow Paper, vol. 151, pp. 1–32, Jan. 2014. Carleton University, Ottawa, ON, Canada, in 2007,
[27] M. Castro and B. Liskov, “Practical byzantine fault tolerance and proac- where he is currently a Professor. His research
tive recovery,” ACM Trans. Comput. Syst., vol. 20, no. 4, pp. 398–461, interests include wireless cyber–physical systems,
Nov. 2002. connected/autonomous vehicles, security, distributed
[28] A. Clement, E. L. Wong, L. Alvisi, M. Dahlin, and M. Marchetti, ledger technology, and deep learning.
“Making byzantine fault tolerant systems tolerate byzantine faults,” in Prof. Yu received the IEEE Outstanding Service Award in 2016; the IEEE
Proc. 6th USENIX Symp. Netw. Syst. Design Implement., Berkeley, CA, Outstanding Leadership Award in 2013; the Carleton Research Achievement
USA, 2009, pp. 153–168. Award in 2012; the Ontario Early Researcher Award (formerly, Premiers
[29] P. Aublin, S. B. Mokhtar, and V. Quéma, “RBFT: Redundant byzantine Research Excellence Award) in 2011; the Excellent Contribution Award at
fault tolerance,” in Proc. IEEE 33rd Int. Conf. Distrib. Comput. Syst., IEEE/IFIP TrustCom 2010; the Leadership Opportunity Fund Award from
Philadelphia, PA, USA, Jul. 2013, pp. 297–306. Canada Foundation of Innovation in 2009; and the Best Paper Awards at
[30] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong, “Zyzzyva: IEEE ICNC 2018, VTC 2017 Spring, ICC 2014, Globecom 2012, IEEE/IFIP
Speculative byzantine fault tolerance,” SIGOPS Oper. Syst. Rev., vol. 41, TrustCom 2009, and International Conference on Networking 2005. He serves
no. 6, pp. 45–58, Oct. 2007. on the editorial boards of several journals, including as the Co-Editor-in-
[31] V. Mnih et al., “Human-level control through deep reinforcement Chief for Ad Hoc and Sensor Wireless Networks and the Lead Series Editor
learning,” Nature, vol. 518, pp. 529–533, Feb. 2015. for the IEEE T RANSACTIONS ON V EHICULAR T ECHNOLOGY, the IEEE
[32] M. Hausknecht and P. Stone, “Deep recurrent Q-learning for partially T RANSACTIONS ON G REEN C OMMUNICATIONS AND N ETWORKING, and
observable MDPs,” Jan. 2017. [Online]. Available: arXiv:1507.06527. IEEE C OMMUNICATIONS S URVEYS & T UTORIALS. He has served as the
[33] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep Technical Program Committee Co-Chair of numerous conferences. He is a
Q-learning with model-based acceleration,” Mar. 2016. [Online]. Registered Professional Engineer in the province of Ontario, Canada, and a
Available: arXiv:1603.00748. Fellow of the Institution of Engineering and Technology. He is a Distinguished
[34] J. Wang, L. Zhao, J. Liu, and N. Kato, “Smart resource allo- Lecturer, the Vice President (Membership), and an Elected Member of the
cation for mobile edge computing: A deep reinforcement learn- Board of Governors of the IEEE Vehicular Technology Society.
ing approach,” IEEE Trans. Emerg. Topics Comput., early access,
doi: 10.1109/TETC.2019.2902661.
Lun Tang received the Ph.D. degree in commu-
[35] Z. Md. Fadlullah, F. Tang, B. Mao, J. Liu, and N. Kato, “On intelligent
nication and information system from Chongqing
traffic control for large-scale heterogeneous networks: A value matrix-
University, Chongqing, China.
based deep learning approach,” IEEE Commun. Lett., vol. 22, no. 12,
He is currently a Professor with the
pp. 2479–2482, Dec. 2018.
School of Communication and Information
[36] N. A. Vien, T. P. Le, and T. Chung, “Deep hierarchical reinforcement
Engineering, Chongqing University of Posts and
learning algorithm in partially observable Markov decision processes,”
Telecommunications, Chongqing. His current
IEEE Access, vol. 6, pp. 49089–49102, 2018.
research interests include 5G cellular networks,
[37] J. N. Tsitsiklis and B. V. Roy, “An analysis of temporal-difference learn-
network slicing, and machine learning.
ing with function approximation,” IEEE Trans. Autom. Control, vol. 42,
no. 5, pp. 674–690, May 1997.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 14,2020 at 16:05:49 UTC from IEEE Xplore. Restrictions apply.

Performance Optimization For Blockchain-Enabled Industrial Internet of Things (IIoT) Systems PDF
No ratings yet
Performance Optimization For Blockchain-Enabled Industrial Internet of Things (IIoT) Systems PDF
12 pages
An Efficient and Compacted DAG-Based Blockchain Protocol For Industrial Internet of Things
No ratings yet
An Efficient and Compacted DAG-Based Blockchain Protocol For Industrial Internet of Things
12 pages
A Survey On SDN-based Intrusion Detection Systems
No ratings yet
A Survey On SDN-based Intrusion Detection Systems
34 pages
2023 Shen Survey
No ratings yet
2023 Shen Survey
24 pages
An Energy-Efficient SDN Controller Architecture For IoT Networks With Blockchain-Based Security
No ratings yet
An Energy-Efficient SDN Controller Architecture For IoT Networks With Blockchain-Based Security
14 pages
Blockchain and Deep Learning-Based IDS For Securing SDN-Enabled Industrial IoT Environments
No ratings yet
Blockchain and Deep Learning-Based IDS For Securing SDN-Enabled Industrial IoT Environments
6 pages
SDN-based Blockchain
No ratings yet
SDN-based Blockchain
8 pages
Toward Unified Control of Networks of Switches and Sensors Through A Network Operating System
No ratings yet
Toward Unified Control of Networks of Switches and Sensors Through A Network Operating System
10 pages
A Promising Integration of SDN and Blockchain For IoT Networks A Survey
No ratings yet
A Promising Integration of SDN and Blockchain For IoT Networks A Survey
23 pages
Sustainability 15 09001
No ratings yet
Sustainability 15 09001
18 pages
Synergistic Integration of Blockchain and Software-Defined Networking in The Internet of Energy Systems
No ratings yet
Synergistic Integration of Blockchain and Software-Defined Networking in The Internet of Energy Systems
8 pages
Blockchain Empowerment Investigating Integration W
No ratings yet
Blockchain Empowerment Investigating Integration W
9 pages
Pobt: A Light Weight Consensus Algorithm For Scalable Iot Business Blockchain
No ratings yet
Pobt: A Light Weight Consensus Algorithm For Scalable Iot Business Blockchain
13 pages
Blockchain For Internet of Things A Survey
No ratings yet
Blockchain For Internet of Things A Survey
19 pages
Paper 44-Promises Challenges and Opportunities of Integrating SDN
No ratings yet
Paper 44-Promises Challenges and Opportunities of Integrating SDN
9 pages
IET Wireless Sensor Systems - 2021 - Nam Nguyen - A Survey of Blockchain Technologies Applied To Software Defined
No ratings yet
IET Wireless Sensor Systems - 2021 - Nam Nguyen - A Survey of Blockchain Technologies Applied To Software Defined
15 pages
2 Sensors-21-04884
No ratings yet
2 Sensors-21-04884
18 pages
Paper 2
No ratings yet
Paper 2
11 pages
Sharding-Hashgraph A High-Performance Blockchain-Based Framework For Industrial Internet of Things With Hashgraph Mechanism
No ratings yet
Sharding-Hashgraph A High-Performance Blockchain-Based Framework For Industrial Internet of Things With Hashgraph Mechanism
10 pages
GTxChain A Secure IoT Smart Blockchain Architecture Based On Graph Neural Network
No ratings yet
GTxChain A Secure IoT Smart Blockchain Architecture Based On Graph Neural Network
13 pages
Toward Trust in Internet of Things Ecosystems Design Principles For Blockchain-Based IoT Applications
No ratings yet
Toward Trust in Internet of Things Ecosystems Design Principles For Blockchain-Based IoT Applications
15 pages
Kumar R. Blockchain and Digital Twin Enabled IoT Networks... 2025
No ratings yet
Kumar R. Blockchain and Digital Twin Enabled IoT Networks... 2025
235 pages
Blockchain and Deep Learning For Secure Communication in Digital Twin Empowered Industrial IoT Network
No ratings yet
Blockchain and Deep Learning For Secure Communication in Digital Twin Empowered Industrial IoT Network
13 pages
A Sharding Scheme-Based Many-Objective Optimization Algorithm For Enhancing Security in Blockchain-Enabled Industrial Internet of Things
No ratings yet
A Sharding Scheme-Based Many-Objective Optimization Algorithm For Enhancing Security in Blockchain-Enabled Industrial Internet of Things
9 pages
Security Improvement in Iot Based On Software Defined Networking (SDN)
No ratings yet
Security Improvement in Iot Based On Software Defined Networking (SDN)
5 pages
Security and Communication Networks - 2021 - Dwivedi - Blockchain Based Internet of Things and Industrial IoT A
No ratings yet
Security and Communication Networks - 2021 - Dwivedi - Blockchain Based Internet of Things and Industrial IoT A
21 pages
Sustainable Blockchain-Based Digital Twin Management Architecture For IoT Devices
No ratings yet
Sustainable Blockchain-Based Digital Twin Management Architecture For IoT Devices
14 pages
Block Hunter Federated Learning For Cyber Threat H
No ratings yet
Block Hunter Federated Learning For Cyber Threat H
10 pages
Iiot 2
No ratings yet
Iiot 2
31 pages
Pobt: A Light Eight Consensus Algorithm For Scalable Iot Business Blockchain
No ratings yet
Pobt: A Light Eight Consensus Algorithm For Scalable Iot Business Blockchain
13 pages
Toward Software-Defined Networking-Based IoT Frameworks - A SLR, Taxonomy, Open Challenges and Prospects
No ratings yet
Toward Software-Defined Networking-Based IoT Frameworks - A SLR, Taxonomy, Open Challenges and Prospects
52 pages
Block-Chain in ED COMPUTING
No ratings yet
Block-Chain in ED COMPUTING
21 pages
Untitled Document
No ratings yet
Untitled Document
27 pages
1 s2.0 S1389128623001718 Main
No ratings yet
1 s2.0 S1389128623001718 Main
46 pages
Define Iot and Characteristics: Key Characteristics of The Internet of Things: Connectivity
No ratings yet
Define Iot and Characteristics: Key Characteristics of The Internet of Things: Connectivity
9 pages
Internet of Things IoT
No ratings yet
Internet of Things IoT
18 pages
Mathematics 11 00418 v3
No ratings yet
Mathematics 11 00418 v3
22 pages
On The Integration of Blockchain and SDN Overview, Applications, and Future Perspectives
No ratings yet
On The Integration of Blockchain and SDN Overview, Applications, and Future Perspectives
44 pages
Network-03-00006-V2 (1) - Not Read
No ratings yet
Network-03-00006-V2 (1) - Not Read
27 pages
Integration of Blockchain and Cloud of Things
No ratings yet
Integration of Blockchain and Cloud of Things
29 pages
Module 6 - SDN-IoT
No ratings yet
Module 6 - SDN-IoT
10 pages
A Dynamic Adaptive Framework For Practical Byzantine Fault Tolerance Consensus Protocol in The Internet of Things
No ratings yet
A Dynamic Adaptive Framework For Practical Byzantine Fault Tolerance Consensus Protocol in The Internet of Things
14 pages
Book Chapter
No ratings yet
Book Chapter
410 pages
10.1007@s10776 020 00488 2 PDF
No ratings yet
10.1007@s10776 020 00488 2 PDF
26 pages
5g Aplication
No ratings yet
5g Aplication
20 pages
1 s2.0 S0167739X1830431X Main
No ratings yet
1 s2.0 S0167739X1830431X Main
6 pages
The Empirical Evaluation of Internet of Things Bas
No ratings yet
The Empirical Evaluation of Internet of Things Bas
12 pages
Sensors 22 05750 v2
No ratings yet
Sensors 22 05750 v2
30 pages
10 0000@ieeexplore Ieee Org@8717819 PDF
No ratings yet
10 0000@ieeexplore Ieee Org@8717819 PDF
5 pages
Memoria
No ratings yet
Memoria
61 pages
Aggarwal - 2021 - IOP - Conf. - Ser. - Mater. - Sci. - Eng. - 1022 - 012103 - Printout Taken
No ratings yet
Aggarwal - 2021 - IOP - Conf. - Ser. - Mater. - Sci. - Eng. - 1022 - 012103 - Printout Taken
13 pages
Assignment No 5
No ratings yet
Assignment No 5
10 pages
RL IoT
No ratings yet
RL IoT
12 pages
A Predominant Intrusion Detection System in Iiot Using Elcg-Dsa and Lws-Biolstm With Blockchain
No ratings yet
A Predominant Intrusion Detection System in Iiot Using Elcg-Dsa and Lws-Biolstm With Blockchain
24 pages
Blockchain-Assisted - Hierarchical - Attribute-Based - Encryption - Scheme - Mark
No ratings yet
Blockchain-Assisted - Hierarchical - Attribute-Based - Encryption - Scheme - Mark
16 pages
Blockchains and Smart Contracts For The Internet of Things
No ratings yet
Blockchains and Smart Contracts For The Internet of Things
7 pages
BlockFlow A Decentralized SDN Controller
No ratings yet
BlockFlow A Decentralized SDN Controller
6 pages
SDN Iot2
No ratings yet
SDN Iot2
6 pages
Screen Exit - ME21N - ME22N - ME23N - Header - SAPCODES
No ratings yet
Screen Exit - ME21N - ME22N - ME23N - Header - SAPCODES
12 pages
Far, Near, Huge Pointer
No ratings yet
Far, Near, Huge Pointer
2 pages
Computing The Variance of A Discrete Probability Distribution
No ratings yet
Computing The Variance of A Discrete Probability Distribution
21 pages
R01uh0912ej0001 h3 m3 vsp2
No ratings yet
R01uh0912ej0001 h3 m3 vsp2
39 pages
SrivaniPatlollaResume PDF
No ratings yet
SrivaniPatlollaResume PDF
2 pages
Unit 2 Arm7
No ratings yet
Unit 2 Arm7
67 pages
CSCD 424 Management Principles in Computing
No ratings yet
CSCD 424 Management Principles in Computing
2 pages
Microsoft Dynamics - AX Installation Guide PDF
No ratings yet
Microsoft Dynamics - AX Installation Guide PDF
330 pages
Nessus and OpenVAS
No ratings yet
Nessus and OpenVAS
4 pages
Autodesk Civil 3D, Dynamo or The API. Why and When?: Learning Objectives
No ratings yet
Autodesk Civil 3D, Dynamo or The API. Why and When?: Learning Objectives
9 pages
DLL With Harbour
No ratings yet
DLL With Harbour
4 pages
Viofo A119 MINI 2 English Manual
No ratings yet
Viofo A119 MINI 2 English Manual
16 pages
The Kampala Mobility News Bite: Entambula Yo
No ratings yet
The Kampala Mobility News Bite: Entambula Yo
3 pages
5G WTTX Planning V1 2
100% (1)
5G WTTX Planning V1 2
41 pages
CorelDRAW X6 Font List PDF
No ratings yet
CorelDRAW X6 Font List PDF
37 pages
DDM ConteHP Universal CMDB - HP UCMDB Discovery and Integration Content Guide
No ratings yet
DDM ConteHP Universal CMDB - HP UCMDB Discovery and Integration Content Guide
1,159 pages
Bharat 6G Vision - DoT
No ratings yet
Bharat 6G Vision - DoT
44 pages
System Programming Unit-3
No ratings yet
System Programming Unit-3
20 pages
Questões pt4
No ratings yet
Questões pt4
4 pages
Project Implementation Roadmap
No ratings yet
Project Implementation Roadmap
12 pages
Gliponeo Lopez Recommendation
No ratings yet
Gliponeo Lopez Recommendation
5 pages
UNIT-3 Software Testing
No ratings yet
UNIT-3 Software Testing
45 pages
HT OP 08 10 SOP en Fuel Probe Calibration English
No ratings yet
HT OP 08 10 SOP en Fuel Probe Calibration English
26 pages
PDF Extra Help
No ratings yet
PDF Extra Help
3 pages
ic-756proII Technical Report
No ratings yet
ic-756proII Technical Report
37 pages
Elk Stack
No ratings yet
Elk Stack
3 pages
Skype Whatsapp y Viber Forensic
No ratings yet
Skype Whatsapp y Viber Forensic
13 pages
Non-Deterministic Finite Automata: Costas Busch - LSU 1
No ratings yet
Non-Deterministic Finite Automata: Costas Busch - LSU 1
102 pages
AP32401 MTV Introduction 180207
No ratings yet
AP32401 MTV Introduction 180207
24 pages
Course Guide Algebra and Trigonometry
100% (1)
Course Guide Algebra and Trigonometry
10 pages

刊出 IoT

Uploaded by

刊出 IoT

Uploaded by

5466 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO.

D. Deep Reinforcement Learning

TABLE II TABLE III

information needed to be shared. It is formed by multiple

b) Pre-prepare: After verification, primary p orders all

Fig. 6. Structure of NN in NAF.

good or bad decision given a certain state st . In other words,

Algorithm 1 Procedure of DRQN-NAF for Each Agent

Fig. 7. Detailed structure of the proposed DRQN-NAF scheme in MEC 1

Fig. 8. Training curves tracking energy efficiency of PBFT, Aardvark, and

observable and the action space is continuous. As we can see,

and centralized DRL, we also fix b and ξ crypto , respectively,

You might also like