Communication-Efficient Federated Learning For Wireless Edge Intelligence in Iot
Communication-Efficient Federated Learning For Wireless Edge Intelligence in Iot
Communication-Efficient Federated Learning For Wireless Edge Intelligence in Iot
7, JULY 2020
Abstract—The rapidly expanding number of Internet of Things and challenges for ML. There is a potential to use the
(IoT) devices is generating huge quantities of data, but public con- large number of IoT devices to perform distributed ML,
cern over data privacy means users are apprehensive to send data rather than in a central server, distributing expensive train-
to a central server for machine learning (ML) purposes. The eas-
ily changed behaviors of edge infrastructure that software-defined ing and inference while keeping private data on user devices.
networking (SDN) provides makes it possible to collate IoT data However, the distributed ML schemes are typically designed
at edge servers and gateways, where federated learning (FL) can for the datacenter, assuming high-performance computing and
be performed: building a central model without uploading data networking hardware, contrasting the highly heterogeneous
to the server. FedAvg is an FL algorithm which has been the computing and communication capabilities of low-powered
subject of much study, however, it suffers from a large number
of rounds to convergence with non-independent identically dis- IoT devices. Also, the IoT devices collect data from dif-
tributed (non-IID) client data sets and high communication costs ferent users, so the distribution over devices can be highly
per round. We propose adapting FedAvg to use a distributed form non-independent identically distributed (non-IID), making it
of Adam optimization, greatly reducing the number of rounds difficult to create the ML models with good performance.
to convergence, along with the novel compression techniques, to Software-defined networking (SDN) has the potential to
produce communication-efficient FedAvg (CE-FedAvg). We per-
form extensive experiments with the MNIST/CIFAR-10 data sets, help alleviate some of the distributed ML problems. AN edge
IID/non-IID client data, varying numbers of clients, client partic- network architecture typically involves IoT devices connected
ipation rates, and compression rates. These show that CE-FedAvg to IoT gateways. Gateways and similar devices often possess
can converge to a target accuracy in up to 6× less rounds than much more computing and storage capacity than IoT devices
similarly compressed FedAvg, while uploading up to 3× less data, and are located at the network edge, closer to user devices,
and is more robust to aggressive compression. Experiments on
an edge-computing-like testbed using Raspberry Pi clients also meaning data do not need to be centrally collected. SDN could
show that CE-FedAvg is able to reach a target accuracy in up be used to easily alter the behavior of IoT gateways: collecting
to 1.7× less real time than FedAvg. data from a group of local IoT devices and performing dis-
Index Terms—Compression, distributed computing, edge com- tributed ML, or delivering these data securely to nearby edge
puting, federated learning (FL), Internet of Things (IoT). servers.
To address the issue of distributed ML, the concept of
federated learning (FL) [2] collaboratively training a model
I. I NTRODUCTION across devices without sharing their data was introduced.
HERE are over 8 billion Internet of Things (IoT) devices McMahan et al. [3] proposed an implementation of FL with
T worldwide, as of 2019 [1]. These are typically low-
powered, embedded devices used for data collection. The
their FedAvg algorithm, designed for user devices such as
smartphones. In FedAvg, clients independently train deep neu-
expansion of the number of IoT devices has therefore led to ral networks (DNNs) on their local data and periodically
an explosion in the quantity of data available for organizations average them. FedAvg has been shown to work in real-world
to use for machine learning (ML) purposes. These organiza- settings by Google with their GBoard [4] next-word-prediction
tions can use this data for insights, consumer products, and and emjoi-prediction [5] software. FedAvg can be used in the
scientific research. However, there is growing public concern IoT/SDN framework described above to provide distributed
over the distribution of private data. Many IoT devices, such as ML at the network edge.
security cameras, fitness and health trackers, and smartphones Modern DNNs contain a huge number of weights, in the
gather sensitive data that users would not wish to share with order of tens to hundreds of millions. As FedAvg requires
a central server. uploading and downloading these models between the clients
The combination of increasing IoT devices and data, along and the central server, several works have been published on
with user privacy concerns presents a series of opportunities reducing the amount of communication performed by FedAvg
to reduce bottlenecks and help ease network use.
Manuscript received September 23, 2019; revised November 15, 2019; A standard method of training DNNs is minibatch-stochastic
accepted November 21, 2019. Date of publication November 28, 2019; date
of current version July 10, 2020. This work was supported by EPSRC DTP gradient descent (mb-SGD): the model weights are updated
Studentship. (Corresponding authors: Jia Hu; Geyong Min.) with their gradients multiplied by a fixed learning rate. Most
The authors are with the College of Engineering, Maths and works on FL use mb-SGD for their DNNs, however, there
Physical Science, University of Exeter, Exeter EX4 4QJ, U.K. (e-mail:
jm729@exeter.ac.uk; j.hu@exeter.ac.uk; g.min@exeter.ac.uk). are more sophisticated optimization techniques based on mb-
Digital Object Identifier 10.1109/JIOT.2019.2956615 SGD, such as AdaGrad [6] and RMSProp [7]. Adam [8] is one
2327-4662
c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
MILLS et al.: COMMUNICATION-EFFICIENT FEDERATED LEARNING FOR WIRELESS EDGE INTELLIGENCE IN IoT 5987
popular technique that features both per-parameter learning Wang et al. [10] addressed this with a control algorithm
rates and momentum. It has been shown to be a very efficient that dynamically changes the number of local SGD updates
optimizer for many tasks and has the potential to improve the that clients perform before uploading their models, based
convergence rate of FedAvg. on a tradeoff between the computing power and networking
FedAvg, therefore, has the following problems when con- bandwidth available to the client.
sidered with the IoT edge-computing scenario. First is the high Several papers have been written on the task of reduc-
communication cost per round due to the large models being ing the amount of data communicated during FedAvg.
sent between the clients and the server, and the second is the Konečný et al. [11] introduced the concepts of structured
high number of rounds to convergence, especially in non- and sketched model updates, and were able to significantly
IID settings. This article proposes communication-efficient reduce the quantity of uploaded data during training. Similarly,
FedAvg (CE-FedAvg), which reduces the number of rounds Sattler et al. [12] demonstrated their sparse binary compres-
to convergence, and the total data uploaded per round over sion system against uncompressed FedAvg, and were able to
FedAvg. We, therefore, make the following contributions. achieve better accuracy while communicating much less data.
1) We propose the CE-FedAvg algorithm, which is com- Lin et al. proposed deep gradient compression [13], a system
posed of two parts: a) distributed Adam optimization and applicable to FedAvg, where gradient updates are accumu-
b) compression of uploaded models. CE-FedAvg reduces lated at clients until their magnitude is greater than a threshold.
the number of communication rounds taken to reach a Sattler et al. [14] proposed sparse ternary compression to com-
target accuracy, and the total data uploaded per round press weight updates during FedAvg, including using Golomb
compared to uncompressed FedAvg. encoding [15] to compress the indeces of weights in sparse
2) We develop two novel schemes for the quantization of matrices produced by sparsification.
communicated weights and moment values of Adam As well as compression of weights, other techniques to
(uniform and exponential quantization). These two reduce communication during FedAvg have been proposed.
schemes are used alongside sparsification and Golomb Chen et al. [16] aggregated shallow layers more frequently
encoding (GE) for the compression in CE-FedAvg. than deep layers (they assert that shallow layers learn gen-
3) We perform extensive simulated experiments of CE- eral features and deep layers learn more specific features).
FedAvg using the MNIST and CIFAR-10 data sets, three Their scheme was able to converge to a target accuracy with
DNN architectures, and varying numbers of clients and up to 7× less communication than FedAvg. Liu et al. [17]
client participation rates. These show that CE-FedAvg is proposed having clients aggregate with intermediate servers
able to reach a target accuracy in far fewer rounds than which then aggregate with a main server with two given
FedAvg in non-IID scenarios, and that it is more robust frequencies. Their experiments showed that increasing aggre-
to aggressive communication reduction. gation frequency causes the global model to converge faster (as
4) We perform further experimentation on an edge- would be expected), and that the global aggregation frequency
computing-like testbed using Raspberry Pi (Rpi) clients has the biggest impact on the convergence rate.
shows CE-FedAvg reduces the real time to convergence Leroy et al. [18] proposed adapting Adam optimizer to
over FedAvg. the FedAvg algorithm. In their system, clients perform SGD
The remainder of this article is structured as follows. We using all their local data and send their weight updates to the
outline the related work in the fields of FL and ML for the server. The server holds the first and second moment values for
IoT. We then describe our proposed algorithm, CE-FedAvg, each weight, which uses to compute the central model weight
in detail. After that, we outline the simulation and testbed updates using the Adam update rule. However, the way we
experiments we ran comparing CE-FEdAvg to FedAvg. We use Adam is different in our work from [18]: our method uses
finally give our conclusions of this article in the last section. Adam optimization at clients, and the Adam values are also
aggregated at the server, as opposed to using Adam solely
at the server. We also propose compression schemes for our
II. R ELATED W ORK
version, which Leroy et al. do not.
A. Federated Learning
McMahan et al. [3] proposed the original FedAvg algo-
rithm to train an aggregate model without uploading client data B. Internet of Things and Machine Learning
to a server. FedAvg drastically reduces total communicated Some works have been published by combining the deep
data compared to datacentre-style distributed SGD (where each learning and the IoT. One seminal work by Liu et al. [19]
worker performs a single step of SGD before aggregation). created a system for identifying foods, in which image prepro-
FL considers clients with non-IID data distributions, which cessing and segmentation was performed on edge devices, and
is an obstacle to training a good central model. Zhao et al. [9] classification was done on a central server, reducing the latency
proposed sharing a small amount of data between non-IID at inference time. Li et al. [20] put forward a similar idea
clients to reduce the difference between clients distributions, where an ML model was trained in the cloud, then initial lay-
increasing the maximum accuracy their FedAvg models were ers of the model were distributed to edge servers. At inference
able to attain. time, the edge servers compute the first layer before sending
Devices participating in FL are assumed to have the result of these to the cloud for the final inference, resulting
highly heterogeneous computing and networking resources. in less data sent to the cloud over sending the raw input.
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
5988 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 7, JULY 2020
For training near the network edge, Lyu et al. [21] created
fog privacy-preserving deep learning (FPPDL), which com-
bines privacy-preserving techniques with intermediate-layer
aggregation of IoT data at “Fog” nodes before aggregation
with a central server.
ML has been applied to a large variety of edge-like devices.
Reina et al. [22] treated unmanned aerial vehicles (UAVs)
as edge devices, and solved the multiobjective optimization
problem for their area coverage using a multisubpopulation
genetic algorithm.
Pathinarupothi et al. [23] developed a unique IoT-based
system to provide health alerts to doctors for patients. Patients
were fitted with multiple physiological sensors, which could
be connected to the Internet using the patient’s smartphone as
a gateway. The gateways had software to determine if doctors
need to be alerted of patients. This system shows the ability
for smartphones to be used as intelligent, programmable gate-
ways for nearby IoT devices, which could be applicable to the
scheme proposed in this article.
Other authors have investigated the resource consumption
of DNNs on IoT and edge devices. Chandakkar et al. [24]
developed a system for retraining and pruning networks at the Fig. 1. One communication round of CE-FedAvg. 1. Clients download the
global model weights, first and second moments (ω, m, v). 2. Clients perform
edge as nodes receive data over time. Similarly, DeepIoT [25] training on their IoT-derived data sets. 3. Clients compress their models. 4.
was created to reduce the size of DNNs for inference by Clients upload their compressed parameters and indeces (ω , m , v , i). 5. The
using a second network to determine the best weights to drop server decompresses the model weights and moments. 6. The server aggregates
all client models before starting a new round of training.
from the original network. Guo et al. [26] proposed a novel
approach that involved training a DNN and using automata
to gradually prune the network weights over time. They were Alongside, in FedAvg, the entire DNN model is sent from
able to slightly improve the MNIST performance of dense clients to server each round. Modern DNNs have a huge num-
networks with this method while pruning > 40% of weights. ber of weights, and the upload bandwidth of edge clients is
Most current work exploring ML and the IoT focuses on typically far lower than the download bandwidth. Therefore,
centralized training, and using IoT/edge computing for infer- uploading models to the server is a significant potential bot-
ence, or on decreasing the size of DNNs for these devices. tleneck of the system. Previous works have shown that these
This article, however, investigates decentralized training using uploaded weight tensors can be compressed without signif-
full DNNs. icantly harming the performance of the model [11]–[14].
However, these schemes typically increase the number of
III. CE-F EDAVG rounds FedAvg takes to converge, albeit with lower total data
The FedAvg algorithm has a single master model that is uploaded by clients.
an aggregate of the client models. For each round of com- To address the above problems, we propose CE-FedAvg: a
munication, the server selects a subset of clients and pushes scheme that both reduces the number of rounds taken to con-
the master model to these clients. Each client then performs verge to a given accuracy and decreases the total data uploaded
a predetermined number of rounds of gradient descent using during training over FedAvg. Fig. 1 shows the operation of
the clients’s local data, pushes their model weight deltas to the one round of CE-FedAvg, and the interaction between IoT
server, and the server then averages these updates to become devices and edge servers. Algorithm 1 shows the details of
the new master model. CE-FedAvg. The UniQ, UniDQ, ExpQ, and ExpDQ. functions
FedAvg features a fixed learning rate across devices as per are shown in Algorithms 2 and 3.
normal mb-SGD. As shown below, this can result in a very In CE-FedAvg, the server first initializes the global model
large number of rounds required to converge to a given accu- with random weights and 0 values for the first and sec-
racy. mb-SGD can result in low convergence rates in later ond Adam moments (line 3). Each communication round, the
rounds of training because some weights need finer “tuning” server selects a subset of clients (line 5) and sends the weights
than others. Adam optimization is a popular alternative to stan- and moments to them. In our algorithm, clients are selected at
dard mb-SGD. It stores two values for each model weight: random, but in reality, clients would be selected based on their
m (the first moment) and v (the second moment), which are power/communication properties at the time as in [4]. These
used along with gradients computed by backpropagation and clients then perform their local Adam SGD (lines 7 and 8) and
global decaying learning rates to update the model weights for upload their data to the server. The server dehumanizes and
each minibatch. Adam reduces both the problems of weights reconstructs the sparse updates from the clients (lines 10–17),
requiring different degrees of tuning (having adaptive rates for averages the updates (lines 20–22), and starts the next
each weight) and local minima (via momentum). round. Each client updates by replacing their current model
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
MILLS et al.: COMMUNICATION-EFFICIENT FEDERATED LEARNING FOR WIRELESS EDGE INTELLIGENCE IN IoT 5989
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
5990 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 7, JULY 2020
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
MILLS et al.: COMMUNICATION-EFFICIENT FEDERATED LEARNING FOR WIRELESS EDGE INTELLIGENCE IN IoT 5991
TABLE I
by a CE-FedAvg client in a single round is between 2.6 and ROUNDS R EQUIRED TO R EACH TARGET T EST ACCURACIES FOR F EDAVG
2× that of FedAvg for 0.5 ≤ s < 1.0, as opposed to 3× if (G RAY ) AND CE-F EDAVG (W HITE ), W ITH U PLOAD S PARSITY s = 0.6
there was no compression. However, despite communicating
2 − 2.6× more data than FedAvg per client per round, due to
the reduction in rounds achieved by CE-FedAvg, clients still
upload less total data than FedAvg in many cases.
V. E XPERIMENTS
A. Simulation Setup
A series of experiments were conducted to evaluate CE-
FedAvg against FedAvg. The experiments were image classi-
fication tasks using the MNIST [27] and CIFAR10 [28] data
sets. Models were implemented using Tensorflow [29].
MNIST: 28 × 28 gray-scale images of handwritten digits in
10 classes. Two models were trained on this data set. The first
(MNIST-2NN) had two fully connected layers of 200 neurons
with ReLU activation, and a softmax output layer. The second
(MNIST-CNN) was a convolutional network consisting of: two
5 × 5 convolutional layers with 32 and 64 output neurons,
respectively, each followed by 2 × 2 max pooling and ReLU
activation; a fully connected layer with 512 neurons and ReLU of rounds to reach a given target accuracy for the best
activation; and a softmax output layer. parameter setup in each case. “NC” is used where no set
CIFAR10: 32 × 32 color images of objects in ten classes. of parameters could be found to get the algorithm to con-
One network (CIFAR-CNN) was trained on this data set with: verge to the target. Sparsity rates of s = {0.6, 0.9, 0.95}
two 3 × 3 convolutional layers with 32 output neurons, L2 provide approximately {7, 25, 53}× compression for FedAvg,
regularization and each followed by batch normalization; a and approximately {9, 33, 68}× compression for CE-FedAvg,
dropout layer with d = 0.2, two 3 × 3 convolutional layers resulting in a FedAvg:CE-FedAvg uploaded data ratio per
with 64 output neurons, L2 regularization, and batch normal- round of ≈ 1:2.3.
ization; a second dropout layer with d = 0.3; two final 3 × 3 Table I shows that more rounds are required to converge
convolutional layers with L2 regularization and batch normal- in non-IID scenarios than IID, as is consistent with other FL
ization; a final d = 0.4 dropout layer; and a softmax output works. As clients’ distributions are very different in the non-
layer. IID scenarios, their models diverge more between aggregations
The networks were trained on the data sets using differing and harming the global model. Table I also shows that with
numbers of clients, classes per client, client participation rate, moderate compression, CE-FedAvg was able to reach the tar-
compression rates, and either FedAvg or CE-FedAvg until a get in significantly fewer rounds in the non-IID scenarios. This
given target accuracy was achieved. The models had the fol- may be because of CE-FedAvg’s adaptive learning rates: Adam
lowing target accuracies: MNIST-2NN, 97%; MNIST-CNN, was able to make more fine-tuned changes to the worker mod-
99%; and CIFAR-CNN, 60%. For each setting, values of E els between aggregations so these models did not diverge as
were tested to find the best for that setting, and when using much. CE-FedAvg did comparatively better with more work-
FedAvg, different learning rates were also tested. ers/decreasing data set size per worker. CE-FedAvg reduced
The entire data set was split across all clients in each case. the rounds taken by ≥ 2.3× in ten cases, including all the
Therefore, increasing number of clients resulted in less sam- MNIST-CNN non-IID cases. In the MNIST-CNN, W = 40,
ples per worker. To produce IID data, the data sets were C = 0.5, Y = 2 case, CE-FedAvg reduced the number of
shuffled and each client given an equal portion of the data. rounds by 5.3× over FedAvg.
For non-IID data, the data sets were sorted by class, divided Tables II and III show that higher compression rates increase
into slices, and each client was either given two slices. This the number of rounds to reach the target, especially, in the non-
resulted in most clients having data from only two classes. IID cases. Again, in all non-IID cases, CE-FedAvg is able to
The test data for each data set was taken from the official test reach the target far faster than FedAvg. For s = 0.9, CE-
sets. FedAvg reached the target in ≥ 2.3× less rounds in 7 cases.
For s = 0.95, CE-FedAvg took ≥ 2.3× less rounds in 5 cases.
B. Simulation Results Taking the same MNIST-CNN case, with s = 0.9, CE-FedAvg
We ran experiments using FedAvg and CE-FedAvg to reach reaches the target in 6× fewer rounds, and 4.3× fewer for
a given target accuracy as described above. The networks s = 0.95.
were run with IID (Y = 10), and non-IID (Y = 2) data For all non-IID CIFAR-10 s = 0.9, 0.95 cases, compressed
partitioning, where Y is the number of classes per worker. FedAvg could not converge within 1000 rounds. The CIFAR-
For FedAvg, multiple global learning rates were also tri- 10 problem is much more complex than MNIST. It is likely
aled for each scenario. Tables I–III show the average number that FedAvg could not converge due to its fixed learning rate.
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
5992 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 7, JULY 2020
TABLE II
ROUNDS R EQUIRED TO R EACH TARGET T EST ACCURACIES FOR F EDAVG
(G RAY ) AND CE-F EDAVG (W HITE ), W ITH U PLOAD S PARSITY s = 0.9.
NC D ENOTES C ASES U NABLE TO C ONVERGE
TABLE III
ROUNDS R EQUIRED TO R EACH TARGET T EST ACCURACIES FOR F EDAVG
(G RAY ) AND CE-F EDAVG (W HITE ), W ITH U PLOAD S PARSITY s = 0.95.
NC D ENOTES C ASES U NABLE TO C ONVERGE
Fig. 3. Top: Compressed uploaded data per client per round for FedAvg and
CE-FedAvg with different sparsities, for the MNIST-CNN W = 40, C = 1.0,
and Y = 2 scenario. Bottom: Total uploaded data during training for the same
scenario.
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
MILLS et al.: COMMUNICATION-EFFICIENT FEDERATED LEARNING FOR WIRELESS EDGE INTELLIGENCE IN IoT 5993
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.
5994 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 7, JULY 2020
[18] D. Leroy, A. Coucke, T. Lavril, T. Gisselbrecht, and J. Dureau, Jed Mills received the B.Sc. degree in natural sci-
“Federated learning for keyword spotting,” in Proc. IEEE Int. Conf. ence from the University of Exeter, Exeter, U.K.,
Acoust. Speech Signal Process. (ICASSP), 2019, pp. 6341–6345. in 2018, where he is currently pursuing the Ph.D.
[19] C. Liu et al., “A new deep learning-based food recognition system for degree in computer science with the College of
dietary assessment on an edge computing service infrastructure,” IEEE Engineering, Maths and Physical Science.
Trans. Services Comput., vol. 11, no. 2, pp. 249–261, Mar./Apr. 2018. His research interests are in machine learn-
[20] H. Li, K. Ota, and M. Dong, “Learning IoT in edge: Deep learning for ing, distributed machine learning, and mobile edge
the Internet of Things with edge computing,” IEEE Netw., vol. 32, no. 1, computing.
pp. 96–101, Jan./Feb. 2018.
[21] L. Lyu, J. C. Bezdek, X. He, and J. Jin, “Fog-embedded deep learning
for the Internet of Things,” IEEE Trans. Ind. Informat., vol. 15, no. 7,
pp. 4206–4215, Jul. 2019. Jia Hu received the B.Eng. and M.Eng. degrees
[22] D. G. Reina, H. Tawfik, and S. L. Toral, “Multi-subpopulation evolu- in electronic engineering from the Huazhong
tionary algorithms for coverage deployment of UAV-networks,” Ad Hoc University of Science and Technology, Wuhan,
Netw., vol. 68, pp. 16–32, Jan. 2018. China, in 2006 and 2004, respectively, and the Ph.D.
[23] R. K. Pathinarupothi, P. Durga, and E. S. Rangan, “Iot-based smart edge degree in computer science from the University of
for global health: Remote monitoring with severity detection and alerts Bradford, Bradford, U.K., in 2010.
transmission,” IEEE Internet Things J., vol. 6, no. 2, pp. 2449–2462, He is a Lecturer of computer science with the
Apr. 2019. University of Exeter, Exeter, U.K. His research
[24] P. S. Chandakkar, Y. Li, P. L. K. Ding, and B. Li, “Strategies for re- interests include cloud and edge computing, resource
training a pruned neural network in an edge computing paradigm,” optimization, applied machine learning, and network
in Proc. IEEE Int. Conf. Edge Comput., Honolulu, HI, USA, 2017, security.
pp. 244–247.
[25] S. Yao, Y. Zhao, A. Zhang, L. Su, and T. Abdelzaher, “DeepIoT: Geyong Min received the B.Sc. degree in com-
Compressing deep neural network structures for sensing systems with puter science from the Huazhong University
a compressor-critic framework,” in Proc. 15th ACM Conf. Embedded of Science and Technology, Wuhan, China, in
Netw. Sensor Syst., 2017, pp. 1–14. 1995, and the Ph.D. degree in computing sci-
[26] H. Guo, S. Li, B. Li, Y. Ma, and X. Ren, “A new learning automata- ence from the University of Glasgow, U.K., in
based pruning method to train deep neural networks,” IEEE Internet 2003.
Things J., vol. 5, no. 5, pp. 3263–3269, Oct. 2018. He is a Professor of high performance computing
[27] Y. LeCun and C. Cortes. (2010). MNIST Handwritten Digit Database. and networking with the Department of Computer
[Online]. Available: http://yann.lecun.com/exdb/mnist/ Science, College of Engineering, Mathematics and
[28] A. Krizhevsky, “Learning multiple layers of features from tiny Physical Sciences, University of Exeter, Exeter, U.K.
images,” Univ. Toronto, Toronto, ON, Canada, Rep. TR-2009, His research interests include future Internet, com-
2009. puter networks, wireless communications, multimedia systems, information
[29] M. Abadi et al.. (2015). TensorFlow: Large-Scale Machine Learning on security, high-performance computing, ubiquitous computing, modeling, and
Heterogeneous Systems. [Online]. Available: tensorflow.org performance engineering.
Authorized licensed use limited to: KUMOH NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 04,2021 at 10:54:52 UTC from IEEE Xplore. Restrictions apply.