Digital Signal Processing Using Deep Neural Networks: Brian Shevitski, Yijing Watkins, Nicole Man and Michael Girard
Digital Signal Processing Using Deep Neural Networks: Brian Shevitski, Yijing Watkins, Nicole Man and Michael Girard
Digital Signal Processing Using Deep Neural Networks: Brian Shevitski, Yijing Watkins, Nicole Man and Michael Girard
Abstract
Currently there is great interest in the utility of deep neural networks (DNNs) for the
physical layer of radio frequency (RF) communications. In this manuscript, we describe a
custom DNN specially designed to solve problems in the RF domain. Our model leverages the
mechanisms of feature extraction and attention through the combination of an autoencoder
convolutional network with a transformer network, to accomplish several important
communications network and digital signals processing (DSP) tasks. We also present a new open
dataset and physical data augmentation model that enables training of DNNs that can perform
automatic modulation classification, infer and correct transmission channel effects, and directly
demodulate baseband RF signals.
1
Engineering the fifth generation (5G) of mobile communications networks is uniquely
challenging due to the complexity of the typical modern wireless radio frequency (RF)
environment. Due to the ubiquity of mobile and wireless devices, a typical 5G system needs to
handle a huge volume of complex, heterogeneous data, presenting new obstacles in the allocation
and management of network resources1.
Currently, there is great interest in the feasibility of embedding machine learning (ML)
directly into a communications network to combat issues that arise in a crowded and diverse RF
environment. Deep neural networks (DNNs) are currently the dominant ML architecture and
have revolutionized ML model performance in the last decade across many domains including
computer vision (CV)2–7, natural language processing (NLP)8–11, and content recommendation12–
14.
DNNs have been applied with varying degrees of success to several tasks in the physical
layer of the RF communications domain15,16. Previous studies have primarily focused on
automatic modulation classification using a variety of different DNN architectures 17. Common
ML models used for image classification in the CV domain have also proven effective for
modulation classification in the RF domain, such as standard convolutional neural networks
(CNNs)18 and residual neural networks (ResNets)17. Limited research has been conducted on the
efficacy of model architectures that are more specialized to the RF domain by using long short-
term memory networks (LSTMs) for modulation type classification19,20. The LSTM architecture
is “time-aware” and has proven invaluable in the analysis of time-series data across a number of
domains21–23.
Previous studies have shown the viability of using DNNs for digital RF signal processing.
Signal demodulation using several different ML architectures has been demonstrated under a
limited range of conditions24–29. Completely learned DNN end-to-end communications systems
have been demonstrated30,31, but these systems can be difficult to train and may require complex
protocol schemes to enable assured communication links. In general, research combining ML
with RF is still nascent, both in assessing what tasks ML can accomplish or improve upon and in
developing tools aligned with the RF modality.
Supervised deep learning generally requires copious amounts of accurately labeled data
in order to train models that perform at a high level. While an open dataset of RF signals does
exist32, a majority of the effects that we are seeking to quantify are either not present or not
labeled in the most commonly used open source RF data set. With this in mind, we designed our
own dataset and physical transmission channel model in PyTorch for use in training models.
Our custom dataset is generated using the framework outlined in Fig. 1(a) which
schematically shows how data moves through a generic RF communications system. First, a
2
stream of random bits is generated at a virtual transmitter (Tx), and a non-return-to-zero (NRZ),
time-domain encoding is created. The NRZ encoding is mapped onto one of 13 distinct
constellations to create a complex baseband digital modulation signal. The baseband signal is
then passed through a root-raised cosine (RRC) filter for pulse shaping. This final, filtered,
baseband signal at the virtual transmitter is denoted Tx and is kept for future steps in the analysis
pipeline.
Fig. 1(b) shows examples of Tx and Rx data for four different modulation types. Each
data example is generated as needed and is accompanied by a suite of important labels and
3
metadata such as message bits, modulation type, transmission channel parameters, number of
time samples per message symbol, etc. The custom design of the dataset and transmission
channel allows us to perform a wide variety of communications network and digital signals
processing tasks using supervised deep learning models.
Fig. 2 Schematic of the hybrid autoencoder/transformer model. The encoder takes a raw digital
baseband signal, perturbed by a transmission channel, from a receiver (Rx) as input and
compresses it into a latent representation, z. The decoder transforms the z back to the original
shape of the input. The output of the decoder (Tx’) is used to reconstruct the unperturbed
baseband signal originally sent by the transmitter (Tx) by minimizing the loss between Tx and
Tx’. The latent vector is used as input for a transformer network followed by fully connected
layers. The final output of the fully connected classifier layers (C) is trained by minimizing the
loss between the classifier output and known labels for the data. Once trained, the output of the
classifier is used to infer various properties about the underlying data and the transmission
channel.
4
The hybrid model architecture is composed of an encoder/decoder pair (E/D) and a
classifier network (C). The encoder network consists of convolutional neural network (CNN)
layers which map an input baseband RF signal (Rx) onto a latent space representation (Z). The
decoder network (D), another series of CNN layers, transforms Z into a reconstructed baseband
RF signal. The output of D (Tx’) is a denoised and reconstructed RF signal. The latent
representation is also used as input for C, the output of which is the predicted label of Rx for the
network task at hand.
Our classifier network uses a transformer (we use the reformer implementation35 of the
transformer architecture) whose output is flattened and fed into fully connected neural network
layers. The transformer, which has enabled great performance increases in the domain of NLP 8,9,
performs an all-to-all comparison of elements which implies a utility in the analysis of sequences
or other data with long-range correlation, e.g. time series data33,34. To our knowledge, our proposed
hybrid model is the first to combine an auto-encoder convolutional network with a transformer
network to attempt to solve various tasks in the RF domain.
Our custom synthetic dataset, coupled with a physical transmission channel model, allows
us to use Rx baseband data as input for a DNN model and to use a wide variety of important signal
and channel parameters, as well as the original Tx signal itself, as targets or labels (model outputs).
Fig. 2 schematically outlines how the hybrid model is trained to perform the various RF
communications network tasks presented throughout this manuscript. The Rx signal is used as
input for the autoencoder network (a reasonable assumption for how a real-world ML
communications system would function) and the Tx signal is used as the target for the decoder
reconstruction loss function. The classifier network loss function uses the various signal and
channel parameters as targets, depending on the task the model is being trained to perform. For
more details on model training and loss functions, see Methods.
Previous studies have been primarily focused on the task of automatic modulation
classification17,19,20,36. As an initial proof of concept of the hybrid models capabilities, we train the
model to predict the modulation type label of our simulated baseband RF signals under a
particularly harsh transmission channel. For this experiment, we allow the phase offset to vary
between 0-360 degrees, the frequency offset to vary up to 1% of the data rate, and the
dimensionless Rayleigh fading parameter, , to vary between 0.1-1.0 (see Methods), with all
parameters randomly generated for each data example. The results of this experiment are shown
in Fig. 3. Because of our custom end-to-end data generation and channel simulation, we can easily
measure the performance of the model for a given signal-to-noise ratio. Figs. 3(a) and 3(b) show
confusion matrices at energy per symbol to noise power spectral density ratios of Es/N0 = 0 and
10 dB, respectively. The curve in Fig. 3(c) shows the mean model accuracy, averaged across all
modulation types at each noise level. For more details on data generation, transmission channel
simulation, and model training, see Methods.
Next, we turn our attention to a more novel DSP task using the hybrid model, directly
inferring transmission channel parameters. To perform this task, we limit the input data strictly to
messages of the quadrature phase-shift keying (QPSK) modulation type (a common digital
5
Fig. 3 Confusion matrices showing performance of the hybrid model at automatic modulation
type classification for Es/N0 of 10 dB (a) and 0 dB (b). The curve in (c) shows the mean
accuracy of the hybrid model across all classes for Es/N0 values between –30 to 40 dB. For high
signal to noise ratios, the hybrid model has a mean accuracy of 98.7% when identifying the
modulation type of an unknown signal, as shown in (c).
modulation scheme used in many applications, including WiFi and Bluetooth®). We use a slightly
less harsh (denoted medium) transmission channel than for the modulation classification task.
Because of the four-fold rotational symmetry of the QPSK constellation, phase shifts of more than
45 degrees are unresolvable from observations of the baseband signal alone. For this reason, we
eliminate Rayleigh fading from the channel entirely (fading introduces a random phase shift to the
baseband signal) and limit the carrier phase offsets to the range of 45 degrees.
Results of the channel parameter regression are shown in Fig. 4. The hybrid model can
simultaneously and explicitly regress the local oscillator phase offset (Fig. 4(a)), local oscillator
frequency offset (Fig. 4(b)), and signal-to-noise level (Fig. 4(c)) for a wide range of these
parameters. To our knowledge, this is the first demonstration of the use of a DNN to directly infer
transmission channel properties. More experimental details, including information on model
dimensions, initialization, and training, can be found in Methods.
The results in Fig. 4 clearly show that the hybrid model can explicitly regress channel
parameters. These parameters can then be used to reconstruct the original transmitted signal from
the data at Rx using either a physical model or traditional analog electronics and signal processing
methods. One key feature of the hybrid model is the encoder/decoder pair and the ability to denoise
and reconstruct the original Tx signal. This implies that the model can implicitly perform the
previously mentioned two-step process (learn channel parameters from Rx signal, then use them
to transform Rx signal into Tx signal) in a single step (directly learn original Tx signal). Fig. 5
shows example Rx, Tx, and reconstruction signals from the regression model discussed previously
that illustrate this property. Figs. 5(a) shows the Rx (green lines), Tx (blue/orange lines), and Tx
reconstruction (dashed black lines) for the in-phase and quadrature components, respectively, of
an example QPSK signal. Fig. 5(c) shows the error between Tx and Tx reconstruction for both
6
Fig. 4 Regression of transmission channel parameters using hybrid model. Each training or
validation example consists of a series of random bits, which is used to generate a baseband
QPSK signal. This signal (Tx) is then put through a physical transmission channel model,
resulting in a perturbed version of the original data (Rx). The hybrid model can accurately
regress the relative phase offset (a) and the frequency offset, expressed as a percentage of the
data rate, RD, (b) between Tx and Rx, as well as the amount of additive white gaussian noise
(AWGN), expressed as the energy per symbol to noise power spectral density ratio, Es/N0,
added to the signal from the transmission channel (c).
Fig. 5 Reconstruction of a baseband QPSK signal using decoder layers of the hybrid model. As
shown in figure 2, the hybrid model reconstructs the original transmitted signal (Tx) using the
received signal (Rx) which has been degraded by the transmission channel. The in-phase (a) and
quadrature (b) components for the Rx, Tx, and Tx reconstruction are shown. The residuals
shown in (c), defined as the absolute difference between Tx and Tx reconstruction expressed in
units of root mean square Tx amplitude, have a mean value of 7 percent.
7
components, scaled by the RMS amplitude of Tx, with a mean error of less than 8 percent (in these
units).
Fig. 6 shows the performance of a hybrid model trained to classify the total number of
symbols in a message. For this task, we generate messages from all 13 modulation types, with a
fixed total length (512 samples), and a variable number of symbols (between 16 and 32 symbols
per message) and propagate the Tx signal through the harsh transmission channel used in the
modulation classification task and discussed in Methods. Figs. 6(a) and 6(b) show the confusion
matrices at unnormalized signal-to-noise ratios of 0 and -10 dB, respectively, with a logarithmic
color scale. The curve in Fig 6(c) shows the mean accuracy across all classes at each SNR. We
note that as the noise level increases and the model’s performance begins to degrade, erroneous
predictions generally differ from the true value by one symbol, an intuitive result.
Fig. 6 Confusion matrices showing performance of the hybrid model at inferring the number of
symbols in a message for signal to noise ratios of 0 dB (a) and -10 dB (b), averaged over all 13
modulation types. The curve in (c) shows the models accuracy (averaged over all possible
modulation types and symbol numbers) at SNR values between –40 to 40 dB. For high signal to
noise ratios, the hybrid model has a mean accuracy of 94.0% when inferring how many symbols
are in an average message, as shown in (c).
The final demonstration of the potential for DNNs in the digital signals processing realm
is a model trained to demodulate digital messages directly from the Rx baseband data. For this
task, we assume a mild channel that approximates a realistic RF communications setup in the
presence of light to moderate Rayleigh fading. We assume that the Tx and Rx RF oscillators are
tuned to one another (within a tolerance typical of average RF consumer equipment), and that the
receiver uses a phase-locked Costas loop to limit the phase shift of the Rx signal. For more details
on the mild channel, see Methods.
Fig. 7 shows the hybrid model’s accuracy in identifying the symbols of a message when
performing direct demodulation of BPSK, QPSK, and 16-QAM modulated input data. The
performance of the model is excellent for all modulation schemes with over 99% accuracy for
messages with low noise. In practice, messages sent in a Rayleigh channel are usually corrected at
8
the receiver using equalization and are specially designed using forward error correction (FEC)
encoding and interleaving to combat the deleterious effects of fading. In contrast, our model is
only assuming a phase-locked loop to correct phase and frequency offset with no other channel
correcting measures being taken. Integration of FEC codes into a DNN demodulator is a ripe area
for future research and could increase model performance to levels that are competitive with
commercial communications systems. Nevertheless, our results indicate that direct demodulation
of baseband RF signals using DNNs is feasible.
In conclusion, we have presented a powerful new dataset for digital signals processing
machine learning tasks in the RF domain. We have also provided a physical transmission channel
model which enables the training of models that can accomplish new and novel tasks. We describe
a new deep learning model, specially inspired by and designed for the RF domain, which combines
an autoencoder convolutional network with a transformer network. We show this hybrid model
can efficiently accomplish a variety of RF signal processing tasks, namely, automatic modulation
classification, regression of transmission channel parameters, signal denoising and reconstruction,
classification of message properties, and direct demodulation of messages.
Fig. 7 Accuracy of hybrid model when demodulating signals for three different modulation types
under a channel with Rayleigh fading and additive white gaussian noise (AWGN). The
percentages in the legend indicate the model accuracy in the asymptotic limit of high signal-to-
noise ratios (expressed here as energy per symbol to noise power spectral density ratio, Es/N0)
for each modulation type.
9
Methods
Data Generation
All code was written in Python using the PyTorch deep learning framework. Baseband RF data
examples are created by the following algorithm: First, an oversampling value (the number of
time samples per symbol) is randomly sampled within a predefined range. This oversampling
value (along with the predefined shape of the final data vector) is used to determine the number
of symbols in the message represented by the data example. Next, a modulation type is selected,
and a random, complex-valued message representation is generated from the symbols in that
modulation’s constellation. Then, an excess bandwidth value for the root-raised-cosine (RRC)
filter is randomly sampled (within a predefined range) and the data example is passed through
the pulse-shaping, low-pass RRC filter. For all data used in this study, the data rate was used as
the base frequency unit.
Carrier phase and frequency offset: Consider a transmitter (Tx) and receiver (Rx) that with
two independent local RF oscillators. If the two oscillators have a frequency offset of 𝑓 and a
phase offset of , the in-phase and quadrature components are transformed between Tx and Rx
(I(t), Q(t) and I’(t), Q’(t), respectively) according to:
10
Harsh channel:
The harsh channel is used for the automatic modulation classification and number of symbols
classification tasks. This channel assumes a moderate Rayleigh fading environment, and no
channel correction between the transmitter and the ML model input at the receiver. The channel
parameters for each training and validation example are drawn from uniform distributions
specified in Table 1.
Medium channel:
The medium channel is used for the regression of channel parameters task. This channel assumes
no Rayleigh fading, and limited channel correction between the transmitter and the ML model
input at the receiver which limits the phase offset. The channel parameters for each training and
validation example are drawn from uniform distributions specified in Table 2.
Mild channel:
The mild channel is used for the demodulation task. This channel assumes a moderate Rayleigh
fading environment, and realistic channel correction between the transmitter and the ML model
input at the receiver (specifically, we assume the presence of a Costas loop). The channel
parameters for each training and validation example are drawn from uniform distributions
specified in Table 3.
11
Model Layer Layouts and Definitions
12
Transformer: We use the Pytorch reformer35 architecture with a hidden dimension of 256, a
depth of 2, and 8 heads. The reformer implementation was chosen because it uses a memory-
efficient approximation of the full attention matrix. This approximation results in a model that
ultimately uses much less memory with faster performance for long sequences, without a
significant reduction in model performance.
Fully Connected Classifier: The output of the transformer is flattened and used as input for a
fully connected layer with an output size of 1024, followed by BatchNorm1D, ReLU, and
Dropout layers. The final layer of the classifier is another fully connected layer whose output
size changes depending on the task at hand. The dropout percentage used throughout is 0.2.
Consider the model, M, pictured in Fig. 2, which takes as inputs a perturbed signal, Rx, a target
signal, Tx, and some labels, L, and returns a reconstructed signal, Tx’, and some probabilities, L’.
M consists of an encoder network, E, a decoder network, D, and a classifier network, C. The
classification loss, ℒ𝐶 , penalizes C for misclassifying a sample’s labels (L ≠ L’) and the
reconstruction loss, ℒ𝑅 , penalizes D for reconstructing a signal that differs from the target signal
(Tx ≠ Tx’). A trained model minimizes the total loss,
where the 𝜆 are weights for each individual loss term and 𝜃𝑀 are parameters of the model. We use
the same L2 reconstruction loss function for all tasks,
Explicit classifier loss functions for each task are given below. Gradients are calculated via back
propagation and the total loss is minimized via stochastic gradient descent using the ADAM
optimizer with a learning rate of 0.001.
Automatic modulation classification: For this task the final output dimension of the classifier is
set to 13, the number of distinct modulation class types. The position of the highest probability in
the output, L, corresponds to the predicted modulation class. The classification loss has a single
term,
13
Here, Nc is the number of possible classes (13 in this case) and 𝑣𝑗 (L) is the one-hot encoded vector
of the true class label.
The model was trained for a total of 128 epochs. For the first 64 epochs, changes to the
reconstruction loss were penalized with 𝜆𝐶 =1 and 𝜆𝑅 =0.001. For the last 64 epochs, this restraint
was removed, and the losses were weighted with 𝜆𝐶 =1 and 𝜆𝑅 =1. The data was generated and split
using 214 and 211 examples from each modulation class for training and validation, respectively.
All data consisted of 512 time-domain samples with oversampling values randomly varying
between 16-32 samples/symbol.
Regression of channel parameters: For this task the final output dimension of the classifier is set
to 4. In the classification model, the output (L’) is a set of probabilities, used to infer a discrete
class label. In contrast, the regression model returns an L’ whose elements are used to infer a set
of continuous parameter values. In this case, the classification loss (Eq. 4), also has 4 terms, one
for each of the model outputs
4 (8)
∑ 𝜆𝐶𝑖 ℒ𝐶𝑖 (𝜃𝑀 ) = ∑ 𝜆𝑖 𝑀𝑆𝐸(𝐿𝑖 , 𝐿𝑖 ′(𝜃𝑀 ))
𝑖 𝑖=1
The key to successfully training the regression model is properly constructing L, the targets for the
model outputs. We use the following values for the model output targets:
These choices scale all values of Li to have magnitudes close to unity. Also, the decision to split
the phase offset, , into two separate trigonometric outputs helps constrain the final value of to
the range (−𝜋, 𝜋) and greatly improves the convergence time and performance of the model.
The model was trained for a total of 500 epochs. The value of each loss term in Eqs. (4) and (8)
are monitored throughout the training process and the 𝜆𝑖 are updated throughout. A summary of
loss weighting is shown in Table 6. Briefly, for the first 150 epochs changes to the reconstruction
loss are heavily penalized with no penalty to any other terms. For the next 150 epochs, changes to
both reconstruction and SNR are penalized. For the next 100 epochs, the phase and frequency
offset terms are slightly adjusted. For the final 100 epochs, all 𝜆𝑖 are set to 1. The data was
generated and split using 217 and 213 examples from the QPSK modulation class for training and
validation, respectively. All data consisted of 512 time-domain samples with oversampling values
randomly varied between 8-16 samples/symbol.
14
Epoch Numbers 𝝀𝑹 𝝀𝑪𝟏 (𝐜𝐨𝐬()) 𝝀𝑪𝟐 (𝐬𝐢𝐧()) 𝝀𝑪𝟑 (𝟏𝟎𝟎 ∗ 𝒇) 𝝀𝑪𝟒 (𝐒𝐍𝐑)
0-150 0.001 1 1 1 1
150-300 0.001 1 1 1 0.01
300-400 0.001 1 1 0.2 0.01
400-500 1 1 1 1 1
Table 6 Summary of loss term weighting during training epochs for regression of channel
parameters task.
Number of message symbols classification: The model was trained for 8 epochs with 𝜆𝐶 = 𝜆𝑅 =
1 for all epochs. The data was generated and split using 214 and 211 examples from each of the 13
modulation classes for training and validation, respectively. All data consisted of 512 time-domain
samples with the number of symbols per message being drawn from a uniform distribution and
varying between 16-32 symbols/message. For this task the oversampling value is derived from the
number of symbols per message instead of being directly drawn from a uniform distribution, as is
the case for all other tasks.
The final output dimension of the classifier is set to 17, the number of possible symbols in a
message. The position of the highest probability in the output, L, corresponds to the predicted
number of symbols/message. The classification loss has a single term,
Signal demodulation:
For this task the final output dimension of the classifier is set to 256k, where k is the number of
symbols in the modulation constellation. Each data example consists of 256 symbols and each
symbol is treated as an individual k-class classification problem. In this case, the classification loss
(Eq. 4), has 256 terms, one for each symbol of the message
256 (13)
∑ 𝜆𝐶𝑖 ℒ𝐶𝑖 (𝜃𝑀 ) = 𝜆𝐶 ∑ 𝐶𝐸(𝐿𝑖 , 𝐿𝑖 ′(𝜃𝑀 ))
𝑖 𝑖=1
where CE is the cross-entropy loss function,
𝑘 (14)
𝐶𝐸(L, L′(𝜃𝑀 )) = − ∑ 𝑣𝑗 (L) log(L′ (𝜃𝑀 ))
𝑗
15
Here, k is the number of possible classes (2 for BPSK, 4 for QPSK, and 16 for 16-QAM) and 𝑣𝑗 (L)
is the one-hot encoded vector of the true message symbols.
The model was trained for a total of 256 epochs with 𝜆𝐶 = 1 and 𝜆𝑅 = 0.01 for the first 128
epochs and 𝜆𝐶 = 𝜆𝑅 = 1 for the final 128 epochs. The data was generated and split using 216 and
213 examples for training and validation, respectively for each modulation type. All data consisted
of 1024 time-domain samples with an oversampling of 4.
Data Availability
Code for dataset generation and transmission channel modeling are available at
https://github.com/pnnl/DieselWolf. Model definitions and analysis code that were used in this
study are available from the corresponding authors upon reasonable request.
16
References
1. Osseiran, A. et al. Scenarios for 5G mobile and wireless communications: the vision of the
2. He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition.
3. Simonyan, K. & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image
5. Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated Residual Transformations for
6. Zhai, X., Kolesnikov, A., Houlsby, N. & Beyer, L. Scaling Vision Transformers.
7. Brock, A., De, S., Smith, S. L. & Simonyan, K. High-Performance Large-Scale Image
8. Vaswani, A. et al. Attention is All you Need. Advances in Neural Information Processing
9. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional
10. Yang, Z. et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding.
17
12. Wang, H., Wang, N. & Yeung, D.-Y. Collaborative Deep Learning for Recommender
13. Cheng, H.-T. et al. Wide & Deep Learning for Recommender Systems. in Proceedings of the
1st Workshop on Deep Learning for Recommender Systems 7–10 (ACM, 2016).
doi:10.1145/2988450.2988454.
14. Fu, M., Qu, H., Yi, Z., Lu, L. & Liu, Y. A Novel Deep Learning-Based Collaborative
Filtering Model for Recommendation System. IEEE Trans. Cybern. 49, 1084–1096 (2019).
15. Zhang, C., Patras, P. & Haddadi, H. Deep Learning in Mobile and Wireless Networking: A
16. Jagannath, J., Polosky, N., Jagannath, A., Restuccia, F. & Melodia, T. Machine learning for
17. O’Shea, T. J., Roy, T. & Clancy, T. C. Over-the-Air Deep Learning Based Radio Signal
Classification. IEEE Journal of Selected Topics in Signal Processing 12, 168–179 (2018).
18. West, N. E. & O’Shea, T. Deep architectures for modulation recognition. in 2017 IEEE
doi:10.1109/DySPAN.2017.7920754.
19. Rajendran, S., Meert, W., Giustiniano, D., Lenders, V. & Pollin, S. Deep Learning Models
for Wireless Signal Classification With Distributed Low-Cost Spectrum Sensors. IEEE
20. Chandhok, S., Joshi, H., Darak, S. J. & Subramanyam, A. V. LSTM Guided Modulation
18
Sensing. in 2019 11th International Conference on Communication Systems Networks
21. Karim, F., Majumdar, S., Darabi, H. & Harford, S. Multivariate LSTM-FCNs for time series
22. Karim, F., Majumdar, S., Darabi, H. & Chen, S. LSTM Fully Convolutional Networks for
23. Graves, A. et al. A Novel Connectionist System for Unconstrained Handwriting Recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 855–868 (2009).
24. Ohnishi, K. & Nakayama, K. A neural demodulator for quadrature amplitude modulation
25. Wang, H. et al. Deep Learning for Signal Demodulation in Physical Layer Wireless
Communications: Prototype Platform, Open Dataset, and Analytics. IEEE Access 7, 30792–
30801 (2019).
26. Amini, M. R. & Balarastaghi, E. Improving ANN BFSK Demodulator Performance with
27. Lerkvaranyu, S., Dejhan, K. & Miyanaga, Y. M-QAM demodulation in an OFDM system
with RBF neural network. in The 2004 47th Midwest Symposium on Circuits and Systems,
28. Önder, M., Akan, A. & Doğan, H. Neural network based receiver design for Software
Defined Radio over unknown channels. in 2013 8th International Conference on Electrical
19
29. Lyu, W., Zhang, Z., Jiao, C., Qin, K. & Zhang, H. Performance Evaluation of Channel
30. Dörner, S., Cammerer, S., Hoydis, J. & Brink, S. ten. Deep Learning Based Communication
Over the Air. IEEE Journal of Selected Topics in Signal Processing 12, 132–143 (2018).
31. O’Shea, T. & Hoydis, J. An Introduction to Deep Learning for the Physical Layer. IEEE
32. O’Shea, T. J. & West, N. Radio Machine Learning Dataset Generation with GNU Radio.
33. Li, S. et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on
Time Series Forecasting. Advances in Neural Information Processing Systems 32, (2019).
34. Wu, N., Green, B., Ben, X. & O’Banion, S. Deep Transformer Models for Time Series
35. Kitaev, N., Kaiser, L. & Levskaya, A. Reformer: The Efficient Transformer. in (2019).
36. Peng, S. et al. Modulation Classification Based on Signal Constellation Diagrams and Deep
Learning. IEEE Transactions on Neural Networks and Learning Systems 30, 718–727 (2019).
20
Acknowledgements
This research was supported by the Pacific Northwest National Laboratory (PNNL) Laboratory
Directed Research and Development program. PNNL is operated for DOE by Battelle Memorial
Institute under contract DE-AC05-76RL01830.
Author contributions
B.S. and M.G. conceived the project. B.S. created the dataset and transmission channel model.
Y.W. developed the hybrid model. B.S., Y.W., N.M., and M.G. trained and tested models,
performed statistical analysis, and generated figures. B.S. and Y.W. wrote the manuscript. M.G.
supervised the project. All authors reviewed the manuscript.
Competing Interests
21