Human activity recognition from sensor data using spatial attention-aided CNN with genetic algorithm

Sarkar, Apu; Hossain, S. K. Sabbir; Sarkar, Ram

doi:10.1007/s00521-022-07911-0

Human activity recognition from sensor data using spatial attention-aided CNN with genetic algorithm

Original Article
Published: 26 October 2022

Volume 35, pages 5165–5191, (2023)
Cite this article

Download PDF

Neural Computing and Applications Aims and scope Submit manuscript

Human activity recognition from sensor data using spatial attention-aided CNN with genetic algorithm

Download PDF

7780 Accesses
Explore all metrics

A Correction to this article was published on 12 January 2023

This article has been updated

Abstract

Capturing time and frequency relationships of time series signals offers an inherent barrier for automatic human activity recognition (HAR) from wearable sensor data. Extracting spatiotemporal context from the feature space of the sensor reading sequence is challenging for the current recurrent, convolutional, or hybrid activity recognition models. The overall classification accuracy also gets affected by large size feature maps that these models generate. To this end, in this work, we have put forth a hybrid architecture for wearable sensor data-based HAR. We initially use Continuous Wavelet Transform to encode the time series of sensor data as multi-channel images. Then, we utilize a Spatial Attention-aided Convolutional Neural Network (CNN) to extract higher-dimensional features. To find the most essential features for recognizing human activities, we develop a novel feature selection (FS) method. In order to identify the fitness of the features for the FS, we first employ three filter-based methods: Mutual Information (MI), Relief-F, and minimum redundancy maximum relevance (mRMR). The best set of features is then chosen by removing the lower-ranked features using a modified version of the Genetic Algorithm (GA). The K-Nearest Neighbors (KNN) classifier is then used to categorize human activities. We conduct comprehensive experiments on five well-known, publicly accessible HAR datasets, namely UCI-HAR, WISDM, MHEALTH, PAMAP2, and HHAR. Our model significantly outperforms the state-of-the-art models in terms of classification performance. We also observe an improvement in overall recognition accuracy with the use of GA-based FS technique with a lower number of features. The source code of the paper is publicly available here https://github.com/apusarkar2195/HAR_WaveletTransform_SpatialAttention_FeatureSelection.

A novel deep learning architecture and MINIROCKET feature extraction method for human activity recognition using ECG, PPG and inertial sensor dataset

Article 26 October 2022

A Hybrid Deep Learning-Based Approach for Human Activity Recognition Using Wearable Sensors

Parallel Attention Based Network for Human Activity Recognition Using Wearable Devices

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Human activity recognition (HAR) is an emerging topic of research in the larger fields of ambient computing and context-aware computing. Recognizing daily life activities is becoming increasingly important in pervasive computing with lots of applications like intelligent surveillance systems [1], healthcare [2], abnormal behavior detection [3], human–computer interaction [4, 5], aid to elderly people to improve the quality of their lives, etc. HAR frameworks provide a way to sense, recognize, and classify specific movements or activities of a person using the data obtained from various sensors. A typical supervised HAR framework can be divided into basic blocks consisting of sensor data accusation, dividing the raw data into fixed-size windows, feature extraction, and finally classification. Each and every activity is represented by one or more fixed-size feature vectors extracted in the feature extraction step. These feature vectors are used for training the classifier.

Based on the usages of the sensor, HAR can be mainly categorized into vision-based HAR and wearable sensor-based HAR. The vision-based technique recognizes and classifies activities by analyzing video or images [6, 7] captured using a camera. Though vision-based techniques have a mature theoretical basis, these techniques have various limitations like ambient light, camera position, potential obstacles, and invasion of privacy issues, which make them difficult in real-life applications. Wearable sensors and inertial sensors of smart devices nowadays become the more promising ways of collecting human activity data as these are easy to use, small in size, and non-intrusiveness on subjects. Besides, these sensors have none/low installation cost and low energy consumption. Smartphone and smartwatches have become a convenient option for HAR as it comes with various embedded sensors like accelerometer, gyroscope, magnetometer, compass, etc.

For the activity prediction tasks, generalizing any model for different activities and sensors is a very challenging task. Based on humans, the activity signal pattern may vary significantly as different humans perform these activities differently. Even the same activity can have different signal patterns as any specific human can do the same activity differently at a different time. Similarly, different activities can have similar signal patterns, which makes the activity classification task more confusing and challenging.

In the recent past, researchers have introduced several handcrafted feature extraction methods to extract various spatiotemporal features from the raw sensor data. The traditional supervised machine learning techniques—Support Vector Machine (SVM) [8,9,10], K-Nearest Neighbors (KNN) [11,12,13], Decision Tree [14], Ensemble approach [15, 16]—are used for classification. However, there are certain limitations of this approach like the requirement of domain expertise and rigorous data pre-processing. Also, failing to establish a proper spatial and temporal relationship among handcrafted features limits the flexibility of these approaches.

Recently, deep learning techniques gain more popularity among researchers. Ability to detect various features automatically from the raw data and to learn various deeper low levels of features gives deep learning techniques an edge over the traditional machine learning techniques. Several deep learning models are successfully applied in different areas like natural language processing [17], image segmentation [18], classification [19], etc. Specifically, convolutional neural networks (CNNs) are well known for producing outstanding results in image recognition [20, 21]. However, reformulating features of time series data as visual clues have raised much attention among computer scientists [22]. The most successful way is to describe features as visual cues [23]. Time series data can be encoded into corresponding activity images using supervisory and non-hyper-visual learning techniques in computer vision to enable deep learning techniques, specifically CNNs, to perform image recognition.

It is to be noted that a feature extraction procedure may produce some irrelevant or redundant features which increase the overall feature space. This is also true for the feature vectors produced by the deep learning model. Hence, these irrelevant features must be eliminated in order to ensure a good classification accuracy and less computational time. A feature selection (FS) algorithm tries to improve the performance of a learning algorithm and decrease the time and space requirements. FS algorithms can be divided into two categories: wrapper and filter. A wrapper method uses a classifier to calculate the fitness of each candidate solution (i.e., a subset of features) and thereby select the subset of features that has the best fitness score. On the other hand, filter-based methods rank the features in order of their importance and eliminate the less important features. Since filter methods do not need a learning algorithm, they tend to perform faster than the wrapper methods in general. However, wrapper methods are known to generate better classification accuracy than filter methods [24]. FS is an NP-hard problem as there can be 2n possible solutions for a feature space containing ‘n’ no of features. Determining the best solution from all the possible solutions is not a feasible option as the computational time required would be quite high. Hence, an alternative and feasible solution is to perform a guided search over the entire feature space using a heuristic strategy. This will not only decrease the computational time significantly but also produce a near-optimal solution.

In this paper, we have proposed an architecture that encodes sensor data into corresponding images and a model that enables HAR to be carried out using a spatial attention-aided CNN model in image recognition. However, the feature set, produced by this CNN model, is quite large in size. To this end, we have proposed an FS approach for selecting the optimal feature subset by eliminating the irrelevant feature attributes which also saves computational time and memory. This implies that we have used the said CNN model as the deep feature extractor only. For FS, a modified version of Genetic Algorithm (GA) [25] is used. Rather than utilizing a time-consuming classifier in each iteration, we have utilized three filter techniques specifically Mutual Information (MI) (entropy based), ReliefF (distance based), and Minimum Redundancy Maximum Relevance—mRMR (statistic based). These three methods rank the features obtained from the CNN model. We have re-ranked the features using the mean of the ranks of the features given by three filter methods. These ranks are used as the fitness of the candidate solutions (i.e., chromosomes). We have also proposed a guided mutation strategy which aims to increase the fitness of the individual chromosomes. The reduced feature set is then fed to the KNN classifier for predicting the accuracy of the overall HAR model.

The key contributions of the proposed work are as follows:

1.
We have proposed a unique image encoding framework based on Continuous Wavelet Transform (CWT) to represent the sensor data into the corresponding spatiotemporal representation.
2.
A spatial attention-aided CNN model is used to extract image features from the encoded images.
3.
In order to reduce computational overhead, we have introduced a modified GA-based feature selection framework that uses three filter-based methods to determine the fitness of each candidate solution.
4.
We have also proposed a guided mutation technique as an improvement over random mutation to increase the fitness score of each candidate solution.

The rest of the paper is structured as follows. Section 2 describes some relevant methods proposed by the other researchers. Details of the proposed method are mentioned in Sect. 3. In Sect. 4, we have reported the results of the proposed model while evaluated on five benchmark HAR datasets. In Sect. 5, we further discuss our findings. Finally, we have concluded the paper in Sect. 6.

2 Related work

Deep learning-based models have achieved outstanding results in a variety of fields including HAR as mentioned in recent surveys [26, 27]. Many state-of-the-art models have been developed using various deep learning techniques like CNN, Recurrent Neural Network (RNN), etc.

CNN models showed lots of promise and achieved higher recognition accuracy than other state-of-the-art methods. Nair et al. [28] used the Temporal CNN architecture, a class of temporal models that used a hierarchy of temporal convolutions, which was able to take variable-length sequence data and learn long-term dependencies. Münzner et al. [29] proposed a CNN-based sensor fusion technique to solve the problems of normalization and fusion of multimodal sensors. In [30,31,32,33,34], authors have used various CNN architectures to improve the recognition accuracy of HAR. Ensemble of CNN models is found in [35,36,37] which aims to achieve better performance than the individual models.

RNN, another deep learning technique, was also extensively used by many researchers for HAR. RNN has the special ability to learn sequences of spatial data. Like, long short-term memory (LSTM)-based networks can learn long-term dependencies from any sequences of data which make it more applicable in wearable/inertial sensor-based HAR. Preeti Agarwal and Mansaf Alam [38] developed a lightweight model using shallow RNN combined with LSTM for activity recognition. Authors in [39,40,41,42] used LSTM-based architectures to learn spatiotemporal features for the classification of human activities. Researchers also proposed various hybrid models like the combination of CNN-RNN [43], CNN-LSTM [44,45,46,47,48], LSTM-CNN [49], CNN-GRU (Gated Recurrent Unit) [50], and achieved significant improvement in recognition accuracy.

Inspired by the recent success of deep learning techniques especially CNN in computer vision, encoding time series data as images gain more acceptance among researchers. This method allows the machine to visually recognize and classify by learning visual patterns and structures. Zhiguang Wang and Tim Oates [22] introduced two frameworks for encoding time series data as images known as Gramian Angular Field (GAF) and Markov Transition Field (MTF). They used Tiled CNNs to classify the single GAF and MTF images as well as the compound GSF-MTF images. The authors in [51] found that varied time series features are not evident in the temporal domain but present in the frequency domain. As an alternative graphical representation for time series classification, they investigated the use of recurrence plots and proposed a method capable of extracting texture features from that graphical representation and used those features to classify time series data. In their work, Garcia-Ceja et al. [52] proposed a similar approach. They modeled the physical activity as a set of recurrence plots’ distance matrices to capture temporal patterns in the signal. Afterward, a CNN was used to classify the distance matrices and obtain the final prediction. In [53], the authors experimentally found that image representation of time series data introduces different feature type that was not available in1D sensor data. Hence, they first encoded the sensor signal as a 2D texture image using a recurrence plot to visualize the recurrent nature of a trajectory through phase space. Then, they used a CNN model to learn different levels of features from the texture images. To address the variability in the distinctive region scale and sequence length, Zhang et al. [54] proposed two stages approach, where firstly they encoded the sensor data using Multi-scale Signed Recurrence Plots (MS-RP), an improvement in recurrence plot, and then applied a Fully Convolutional Networks and ResNet to handle these images. Hur et al. [55] proposed a novel encoding technique for converting an inertial sensor signal into an image with minimal distortion, namely Iss2Image (Inertial sensor signal to Image). Iss2Image divided real-valued sensor reading into three parts: integers, first two decimal places, and the next two decimal places, and then encoded as a three-channel image. Finally, a CNN model was used for image-based activity classification. Another similar encoding technique was proposed by Daniel et al. in [56]. The proposed INIM framework first encoded the sensor’s signal into 3D RGB images and then used a residual network trained on the ImageNet dataset [57] for activity recognition. Qin et al. [58] introduced a novel method to encode time series data into two-channel GAF images by unifying global and local time series features. Then, they presented a fusion ResNet framework, which learned the generated GAF image pixels correspondences between acceleration and angular velocity features. Almost similar work was done by the authors in [59]. Contrary to the previous work, they used four different types of activity images and made each one multimodal by convolving it with two spatial domain filters: the Prewitt filter and the high-boost filter. ResNet-18 was used to extract the deep features from multi-modalities and fused by canonical correlation-based fusion. Finally, a multi-class SVM was used for activity recognition. In [60], the authors have implemented the idea of transforming the 1D signal into 2D using Fast Fourier Transform (FFT). This frequency-domain image was called the spectrogram, which represents the composition of a signal from several frequencies over time and acts as an input to a three-layered CNN model for features extraction and classification. Lawal et al. [61] in their work encoded sensors signal into spectrogram using Short-Time Fourier Transformation (STFT). A simplified two-stream VGG-Net [20] like CNN architecture was proposed for activity and location recognition.

A few researchers have also tried to choose the relevant features utilizing various FS-based techniques [62, 63] for improving the overall accuracy in the field of activity recognition. Buenaventura et al. [64] proposed a HAR model based on sensor fusion in smartphones which used a filter-based method to rank the features. An enhanced HAR method was proposed by Fan et al. [65] where Bee Swarm Optimization (BSO) with a deep-Q-network was used. Dewi et al. [66] performed a comparative study on HAR datasets using four classifiers namely Random Forest (RF), SVM, KNN, and Linear Discriminant Analysis (LDA) from which it was concluded that RF has the highest accuracy. Nguyen et al. [67] proposed a position-based FS method for body sensors for daily activity recognition. Filter-based methods were used to reduce the feature set followed by a correlation-based optimization and a classifier to determine the overall accuracy of the proposed method.

GA is one of the oldest and most widely used meta-heuristic algorithms which have been explored by numerous researchers in various domains such as image contrast enhancement, class imbalance, stock price prediction, image segmentation, medical diagnostic, image steganography, feature selection, etc. Saitoh [68] proposed an image contrast enhancement technique based on GA that assessed an individual’s fitness by evaluating the intensity of spatial edges included in the image. GA was used to search for a solution in global space, and the original gray image was converted to a contrast-enhanced image by observing the relationship between the input and output gray levels. In [69], an efficient image contrast enhancement using GA and fuzzy intensification operator was proposed which improved the visibility information of an image by manipulating the image intensity information. A novel oversampling approach was introduced by Arun et al. [70] to address the class imbalance problem using GA. Synthetic samples of the minority class are generated based on the distribution measure which ensures that the samples are efficient and diverse within each class. Experimental results indicated that GA-based oversampling approach improved the fault prediction performance and reduced the false alarm rate. Ha et al. [71] proposed a novel under-sampling method using GA for imbalanced data classification. The performance of the prototype classifier was maximized by minimizing the loss between distributions of original and undersampled majority objects. A novel method for stock market forecasting with Artificial Neural Network (ANN) and GA was proposed by Sharma et al. [72]. The dataset was partitioned into training, testing, and validation sets, and the stock data of COVID-19 period were used for model validation. Furthermore, in [73] a combination of GA and LSTM was proposed for stock prediction. In the initial step, GA was used to obtain ranked important factors, and finally, the optimal factors along with LSTM were used for prediction. Chun et al. [74] proposed a robust image segmentation using GA with a fuzzy measure. A fuzzy validity function was proposed which measured the degree of separation and compactness within the finely segmented regions. To maximize the quality of regions obtained by split and merge processing, a usable region segmentation was searched using GA. In [75], an image segmentation method with GA was proposed where GA was used for segmenting the images into four gray classes. A cardiovascular disease prediction using GA and neural network was proposed by Amma [76] where the weights of the neural network were determined using GA which provided a good set of weights in a few iterations. Initially, the dataset was pre-processed followed by training the system and storing the final weights which were finally used for predicting the risk of cardiovascular disease. Uyar et al. [77] proposed a GA-based trained recurrent fuzzy neural network (RFNN) method for the diagnosis of heart diseases. Hossain et al. [78] introduced a secured image steganography method based on GA and ballot transform for the integrity of important files over internet. In addition to achieving a good accuracy, various parameters such as precision, F-score, probability of misclassification error, mean square error, etc. were also calculated.

Owing to the success of GA in solving various complex optimization problems, many researchers have used GA for the FS purpose which is a binary optimization problem. Some areas where GA is used as an FS method are: microstructural image classification [79], cancerous gene identification [80], handwritten Devanagari numeral recognition [81], handwritten Bangla word recognition [82], handwritten Bangla, Devanagari and Roman numeral classification [83], video and sensor-based HAR [62], etc. Rostami et al. [84] developed a novel community-based FS method to group similar features into feature clusters. This method predicted the number of feature clusters automatically, hence eliminating the need to determine it beforehand. GA is then applied to select the optimum subset of features by defining an objective function with an importance value attached to each feature subset. In [85], a novel cancer classification technique was proposed using deep learning and GA. It was applied to determine and classify the cancer types from the publicly available gene expression data. Tian et al. [86] proposed deep learning model selection framework based on GA for visual data classification. The process of identifying the most relevant and useful features generated by pre-trained models for different tasks was automated by the framework. In [87], a deep learning method was developed to classify different brain activities along with GA to eliminate the redundant features. Various deep learning models, namely X_axis Classification Model (XCM), Y_axis Classification Model (YCM), and Z_axis Classification Model (ZCM), were used for this purpose. These models were used to classify among the vision, movement, and forward brain activities followed by an effective combination method based on GA and Genetic Weighted Summation (GWS) rule. In 2019, Ghosh et al. [88] introduced a combination of GA and PSO for feature selection which utilized the exploitation ability of GA with the exploration capacity of PSO. Guha et al. [89] proposed a deluge-based GA to strengthen the exploitational ability and performed good on the well-known UCI datasets. In 2021, kilicarslan et al. [90] proposed a hybrid model based on GA and deep learning for nutritional anemia disease classification. GA was used to optimize the hyperparameters of Stacked Autoencoder (SAE) and CNN models. The proposed method achieved an accuracy of 98.50% when applied on real anemia dataset. Ince [91] proposed a deep learning and GA-based intelligent and automatic content visualization system. The method segmented the input image into panoptic image instances and used these to generate new images using GA. The results proved that the said method was efficient to create visually enhanced content for digital use.

Motivations: From the above discussion, it can be concluded that many researchers around the world have tried to classify human activities by analyzing the activity images. It can be observed that recognizing human activities from sensor data has always been an interesting and challenging task. Some activities such as running and walking are easy to recognize. However, there are some complex activities which are relatively difficult to classify. Developing an efficient activity recognition model can lead to the development in many potential fields such as health, sports, and understanding the psychological state of a person. For this purpose, machine learning and deep learning-based methods contributed significantly to the development of competent HAR models. However, many of these methods use heavy networks (mainly deep learning-based methods) and some even produced lower classification accuracy due the use of some irrelevant features. On the contrary, FS-based techniques not only speed up the process (i.e., take less computational time) but also increase the classification accuracy significantly. However, wrapper-based FS techniques which use a learning algorithm are slower than filter-based methods. Keeping the above facts in mind and to further speed up the process, a modified version of GA method is proposed here, which uses three filter-based methods to calculate the fitness of the chromosomes that effectively acts as the fitness function of GA. The proposed method has been evaluated on five publicly available datasets. It is observed that this method is much faster than the traditional GA, and the overall framework also outperforms many existing methods in terms of classification accuracy.

3 Proposed method

Here in this section, we first briefly discuss the proposed activity image encoding technique. Then, we explore the features extraction process from the encoded images. Finally, we present the proposed novel FS technique used for HAR. Figure 1 shows the working procedure of the proposed framework.

3.1 Continuous wavelet transform

Wavelet transform has been applied in time–frequency analysis and spatial domain signal analysis over the years, and this is one of the most effective mathematical tools used for signal processing. A wavelet transform is a signal convolution with a set of functions derived from translations and dilations of a primary function. The primary function is known to as the mother wavelet, and the translated or dilated functions are referred to as wavelets.

A wavelet is a rapidly decaying wave-like oscillation defined as function $\psi (t) \epsilon L^2(R)$ with a zero mean and exists for a finite duration, localized both time and frequency. By scaling and translating this wavelet $\psi (t)$, we can produce a family of wavelets by using Eq. (1) as

$$\begin{array}{*{20}l} \psi _{a,b}\left( {t}\right) = \frac{1}{\sqrt{a}} \psi \left( \frac{t-b}{a} \right) \end{array}$$

(1)

where $a,b\epsilon R$ and $a>0$. a is known as the scaling parameter, and b is the transitional value. The wavelet transform of a continuous signal with respect to wavelet function $\psi \left( t \right)$ is defined as Eq. (2)

$$W_x \left( a,b \right) = \int \limits_{-\infty }^{+\infty } x(t) \psi _{a,b}^{*} \left( t \right) \text {d}t$$

(2)

where x(t) is a time-domain signal; $\psi _{a,b}^{*} \left( t \right)$ is the complex conjugate of mother wavelet. From Eqs. (1) and (2), we get Eq. (3), which defines the CWT as

$$\begin{array}{*{20}l} X_w \left( a,b \right) = \frac{1}{\sqrt{a}} {\int _{-\infty }^{+\infty } x(t) \psi ^{*} \left( \frac{t-b}{a} \right) \text {d}t} \end{array}$$

(3)

CWT is nothing but the inner product of signal x(t) with a continuous wavelet $\psi \left( t \right)$ scaled by parameter a and translated by value b. The pseudocode for the CWT is shown in Algorithm-1.

The outputs of the CWT are CWT coefficients, which reflect the similarity between the analyzed signal and the wavelet. These coefficients can be represented as a 2D image equivalent to the power spectrum, where time and scale/frequency are the 2 dimensions. However, the CWT coefficients depend on the choice of the mother wavelet.

One of the main advantages of wavelet transform is the presence of a wide variety of wavelets to choose from that best match the shape. In this work, we use the Gaussian Derivative Wavelets, specifically fifth-order derivatives of the function given in Eq. (4)

$$\begin{array}{*{20}l} \psi \left( t \right) = C \exp ^{-t2} \end{array}$$

(4)

where C is the order-dependent normalization constant.

The fifth-order Gaussian Derivative wavelet is a real-valued odd function, which is anti-symmetric around zero. The shape of the fifth-order Gaussian Derivative wavelet and various scaled wavelets is shown in Fig. 2.

As the wavelet is a real-valued function, hence the imaginary part of the wavelet is zero.

3.2 Inertial sensor to image encoding using CWT

In order to encode the raw sensor time series data into an image form, we use the 1D CWT, which takes 1D time series as input and generates a 2D frequency-time domain scalogram. This scalogram is nothing but the CWT coefficients. Figure 3 depicts the image encoding process.

Performing CWT on the entire time series dataset is practically infeasible. Hence, instead, we perform CWT on each sample of size $t \times c$ where t is the number of timestamps and c is the total number of sensor channels. The pseudocode for CWT-based image encoding technique is given in Algorithm-2. The value of t and c varies from dataset to dataset. Each of the channels in c is a 1D time series and acts as the input to the CWT. We use t as the scale parameter. For each such sensor channel, we get a $t \times t$ scalogram as the output. Hence, for one sample, we get a c-dimensional $t \times t$ scalogram where each dimension corresponds to each sensor channel.

Based on the above-mentioned way, we encode each and every activity of a dataset as a $t\times t\times c$-dimensional image.

3.3 Features extraction using spatial attention-aided CNN

A CNN is large a deep neural network that simulates and understands stimuli as the visual cortex of the brain processes. A typical CNN model can be thought of as a combination of two components: the features extractor part and the classification part. The hidden layers are the CNN’s features extractor, which consists of a series of convolution layers followed by pooling layers that try to detect complex features and patterns belonging to the image of a particular class by convolving with various filters followed by sub-sampling. The classification part then utilizes these features and computes the prediction probabilities as output. Even though CNN performs very well in the image classification task, sometimes the requirement of huge data for more accurate prediction limits its use as a classifier. As a result, in the current work, rather than using the CNN model as a classifier, we only used it as a features extractor.

Figure 4 shows the architecture of the proposed feature extractor. It mainly consists of a CNN having four convolution layers and spatial attention sub-networks. The spatial attention sub-networks, which are variants of widely used CNNs, use attention modules to fine-tune the feature maps in each convolution layer, thereby enhancing CNN’s learning ability.

Following each convolution layer, we have used a max-pooling layer to lessen data variance and a dropout layer to avoid over-fitting. Before the max-pooling layer, the attention feature maps from the spatial attention sub-network are added to re-calibrate the original features. This layering scheme is repeated three times with a different number of $3\times 3$ filters. All neurons of these convolution layers have Re-LU (Rectified Linear Unit) as an activation function to learn the nonlinear representation. The details of the network architecture are given in Table 1.

Table 1 Details of the CNN architecture used for the purpose of feature extraction. SAM here refers to Spatial Attention Module

Human activity recognition from sensor data using spatial attention-aided CNN with genetic algorithm

Abstract

Similar content being viewed by others

A novel deep learning architecture and MINIROCKET feature extraction method for human activity recognition using ECG, PPG and inertial sensor dataset

A Hybrid Deep Learning-Based Approach for Human Activity Recognition Using Wearable Sensors

Parallel Attention Based Network for Human Activity Recognition Using Wearable Devices

Explore related subjects

1 Introduction

2 Related work

3 Proposed method

3.1 Continuous wavelet transform

3.2 Inertial sensor to image encoding using CWT

3.3 Features extraction using spatial attention-aided CNN

3.4 Spatial attention module

3.5 Feature selection

3.5.1 Filter methods

3.5.2 Genetic Algorithm: an overview

3.5.3 Proposed GA variant

3.5.4 Fitness function

4 Experiments and results

4.1 Model implementation

4.2 Database description

4.3 Performance metrics

4.4 Results

4.4.1 Evaluation on UCI-HAR dataset

4.4.2 Evaluation on WISDM dataset

4.4.3 Evaluation on MHEALTH dataset

4.4.4 Evaluation on PAMAP2 dataset

4.4.5 Evaluation on HHAR dataset

4.5 Impact of FS hyper-parameters on model performance

4.5.1 Effect of population Size

4.5.2 Effect of crossover probability

4.5.3 Effect of number of iterations

4.6 Comparison with state-of-the-art methods

5 Discussion

6 Conclusion

Data Availability Statement

Change history

12 January 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation