The technological advancement in recent years has seen a great shift leading to a greater magnitude of data storage, communication and subsequent processing with reduced size and cost of the systems. Huge sets of data are uploaded to digital medium providing a great resource of information in different field of studies pertaining to all walks of life. There is huge potential in the way that these data can be utilized and has great potential towards establishing sound economies. Such large-scale data repositories must be analysed in a scientific and systematic way to efficiently extract the hidden knowledge, trends and useful patterns without losing its originality. This cannot be achieved only through conventional Machine Learning (ML) frameworks and technologies. This ultimately drives the necessity for advanced ML technique such as an advanced deep neural network that is inspired by the structure of the brain.

The main focus of this special issue was to provide an opportunity for the researchers of both academia and practitioner to provide a diverse, but complementary set of contributions to present new development and application of advanced deep neural network. We, thus, welcomed both theoretical and applied (application-oriented) work. The review process complied with the standard review process of the Neural Computing and Applications journal. Each paper received at least two reviews from experts in the field and was accepted based on their originality, quality and relevance to the area.

In the first paper titled “An Assortment of Evolutionary Computation Techniques (AECT) in Gaming”, the authors Maham Khalid, Feras Al-Obeidat, Abdallah Tubaishat, Babar Shah, Saad Razzaq, Fahad Maqbool and Muhammad Ilyas aimed to empower (Non-Player Characters) NPCs with intelligent traits. For that purpose, they have instigated an assortment of Ant Colony Optimization (ACO) with Genetic Algorithm (GA)-based approach to First-Person Shooter (FPS) game i.e. Zombies Redemption (ZR). Eminent NPCs with best-fit genes are elected to spawn NPCs over generations and game levels as yielded by GA. The proposed technique ZR is novel as it integrates ACO and GA in FPS games where NPC will use ACO to exploit and optimize its current strategy. While GA will be used to share and explore strategy among NPCs. Moreover, it involves an elaboration of the mechanism of evolution through parameter utilization and updating over the generations. ZR is played by 450 players with varying levels having the evolving traits of NPCs and environmental constraints in order to accumulate experimental results. Results revealed improvement in NPCs performance as the game proceeds.

In the second paper, “Gene-encoder: A feature selection technique through unsupervised deep learning-based clustering for large gene expression data” the authors Uzma Khan, Feras Al-Obeidat, Abdallah Tubaishat, Babar Shah and Zahid Halim present gene encoder, an unsupervised two-stage feature selection technique for the cancer samples’ classification. The first stage aggregates three filter methods, namely, Principal Component Analysis (PCA), correlation and spectral-based feature selection techniques. Next, the Genetic Algorithm (GA) is used, which evaluates the chromosome utilizing the auto encoder-based clustering. The resultant feature subset is used for the classification task. Three classifiers, namely, Support Vector Machine (SVM), k-Nearest Neighbours (k-NN) and Random Forest (RF) are used in their work to avoid the dependency on any one classifier. Six benchmark gene expression datasets are used for the performance evaluation and a comparisons that is made with four state-of-the-art related algorithms. Three set of experiments are carried out to evaluate the proposed method. These experiments are for the evaluation of the selected features based on sample-based clustering, adjusting optimal parameters, and for selecting better classifier. The comparison is based on accuracy, recall, false positive rate, precision, F-measure and entropy. The obtained results suggest better performance for the cancer data sampling.

The third paper titled “Improving Aspect-Level Sentiment Analysis with Aspect Extraction” authored by Navonil Majumder, Rishabh Bharadwaj, Soujanya Poria, Alexander Gelbukh and Amir Hussain is about aspect-based sentiment analysis (ABSA), a popular research area in NLP has two distinct parts, Aspect Extraction (AE) and labelling the aspects with sentiment polarity (ALSA). They hypothesize that transferring knowledge from a pre-trained AE model can benefit the performance of ALSA models. Based on this hypothesis, word embedding are obtained during AE and, subsequently, feed that to the ALSA model. Empirically, they claim to show that the added information significantly improves the performance of three different baseline ALSA models on two distinct domains.

In the fourth paper titled “Improving Generative Adversarial Networks with Simple Latent Distributions”, the authors Shufei Zhang, Kaizhu Huang, Zhuang Qian, Rui Zhang and Amir Hussain claim to have tackled the problem with Generative Adversarial Networks. GANs are powerful generative models that usually suffer from hard training and consequently may lead to poor generation in some cases. They have tried, in this paper, to alleviate this problem from a different perspective: while retaining information as much as possible of the original high dimensional distributions, they learn and leverage an additional latent space where simple distributions are defined in a low dimensional space; as a result, they have readily measured the departure of two simple distributions with available divergences. Specifically, to retain information, mutual information is maximized between the variables of the high dimensional complex distribution and the simple low dimensional distribution. The departure of the resulting simple distributions are then measured in the original way of GANs. In comparison with the existing methods measuring directly the distribution departure in the high dimensional space, their proposed method clearly demonstrates its superiority. They have also conducted series of experiments to show the advantages of the proposed SimpleGAN technique.

In the fifth paper titled “Impact of Data Smoothing on Semantic Segmentation”, the authors Nuhman Ul Haq, Zia ur Rehman, Ahmad Khan, Ahmad Din, Sajid Shah, Abrar Ullah and Fawad Qayum have investigated the impact of data smoothing on the training and generalization of deep semantic segmentation models. A mechanism is proposed to select the best level of smoothing to get better generalization of the deep semantic segmentation models. Furthermore, a smoothing layer is included in the deep semantic segmentation models to automatically adjust the level of smoothing.

In the sixth paper titled “Effect of Number of Neurons and Layers in an Artificial Neural Network for Generalized Concrete Mix Design”, the authors Mohammad Adil, Rahat Ullah, Salma Noor and Neelam Gohar have undertaken a thorough analysis for deciding the number of neurons for mix design of concrete in their research. The results of neural networks for a varying number of neurons and layers have been correlated. The data for a number of concrete mixes are sorted and normalized. Properties of concrete constituents like specific gravity of cement, coarse and fine aggregates along with other inputs (total of 17) and 5 outputs it has been discovered that simple neural network with single or double hidden layers performed better that 3 or more such layers for better mixing of concrete.

In the seventh paper “A Machine Learning-Based Approach for the Segmentation and Classification of Malignant Cells in Breast Cytology Images using Grey Level Co-occurrence Matrix (GLCM) and Support Vector Machine (SVM)”, the authors Sana Ullah Khan, Naveed Islam, Zahoor Jan, Khalid Haseeb, Syed Inayat Ali Shah and Muhammad Hanif have proposed a machine learning-based approach for malignant cell segmentation and classification in breast cytology images. In their proposed approach, the segmentation of cells is performed by a level set algorithm which is used to extract statistical information related to the malignant and benign cells. Similarly, the Grey Level Co-occurrence Matrix (GLCM) is computed to exploit the texture information and Support Vector Machine (SVM)-based classification is used for the classification of malignant and benign cells.

The eighth paper titled “Do Semantics Always Aid Syntax? An Empirical Study on Named Entity Recognition and Classification” is authored by Xiaoshi Zhong, Erik Cambria and Amir Hussain. In this paper, they have empirically examined whether named entity classification improves the performance of named entity recognition as an empirical case of examining whether semantics improve the performance of a syntactic task. To this end, they firstly specify the ways to determine a linguistic task as a syntactic task or a semantic task according to both syntactic theory and semantic theory. After that, they conducted 6 extensive experiments using three state-of-the-art models on two benchmarks. They claimed that the results demonstrate that named entity recognition is a syntactic task, and the joint modelling of named entity recognition and classification does not improve the performance of named entity recognition. This suggests to examine whether individual tasks can enhance each other before we simultaneously model multiple tasks.

The ninth paper is authored by Ginni Arora, Ashwani Kumar Dubey, Zainul Abdin Jaffery and Álvaro Rocha titled as “Bag of Feature and Support Vector Machine Based Early Diagnosis of Skin Cancer”. In their research, they have attempted to developed Computer-aided detection and diagnosis systems for classifying a lesion into cancer or non-cancer owing to the usage of precise feature extraction technique. This paper proposed the fusion of Bag of feature method with Speeded up robust features for feature extraction and quadratic support vector machine for classification. The proposed method shows the accuracy of 85.7%, sensitivity of 100%, specificity of 60% and training time of 0.8507 s in classifying the lesion.

The tenth paper titled “Building a Knowledge Graph by using Cross Lingual Transfer Method and Distributed MinIE Algorithm on Apache Spark” authored by Phuc Do, Trung Phan, Hung Le and B. B. Gupta is in the domain of Natural Language processing. In this paper, they present a pipeline to extract core knowledge from large quantity text using distributed computing. The components of their pipeline are systems that were known to yield good results. The outputs of the proposed system are stored in a Knowledge Graph. They have then developed a cross-lingual transfer method to build a Vietnamese knowledge graph from the graphs that are in English.

The eleventh paper “Multiclass Classification of Nutrients Deficiency of Apple using Deep Neural Network” authored by Yogesh, Ashwani Kumar Dubey, Rajeev Ratan and Alvaro Rocha has worked in the agricultural industry and has analysed and recognized the nutrients deficiency in fruits through proposing a deep neural-based model to propose improvement in the deficiency of nutrients in fruits. In this paper, a database has been created for four major types of nutrients efficiency in apples and used for training and validation of the proposed deep convolutional network. The model is tuned with k fold cross-validation. The hyper-parameters such as epoch is set at 100 and batch size kept at 5. Finally, the model is tested with the testing data and achieved an average accuracy of 98.24% with k fold cross-validation set to 15.

The twelth paper “Recommendation of technological proles to collaborate in software projects using document embeddings” is authored by Pablo Chamoso, Guillermo Hernandez, Alfonso Gonzalez-Briones, Francisco J. Garca-Peñalvo. This research work attempted to identify the most apt users (where the term “user” refers to any technology professional, for example a software developer, who has registered on any given platform), from a set of different user profiles, for fixing bugs in a software project. The study has been carried out by analysing large-scale repositories of open source projects with a large historical volume of bugs, and the extracted knowledge has been successfully applied to new, unrelated projects. Different similarity-based profile raking procedures have been studied, including neural-network-based incidence representation. The obtained results show that the system can be directly applied to different environments and that the selected user profiles are very close to those selected by human experts, which demonstrates the correct functioning of the proposed system.

The thirteen paper “A Novel Emergency Situation Awareness Machine Learning Approach to Assess Flood Disaster Risk based on Chinese Weibo” is authored by Bai Hua, Yu Hualong, Yu Guang and Huang Xing. They have attempted to improve the emergency situation awareness of flooding disasters. They have proposed a new regional emergency situation awareness model to automatic assess the flood disaster risk. Their algorithm helps to assign the relevant event-meta tags to each situation microbloggings. Secondly, a new machine learning method for dynamic assessment of flood risk for online microbloggings is developed. The proposed model is applied in the case of the Yuyao Flood. The results of the case study show that the Yuyao Flood’s online quantitative risk assessment results, from their model, are consistent with real accumulated precipitation data.

Fourteenth paper is authored by Feras Al-Obeidat, Alvaro Rocha, Maryam Akram, Saad Razzaq and Fahad Maqbool titled “(CDRGI)-Cancer detection through relevant Genes Identification”. CDRGI presents a Discrete Filtering technique based on a Binary Artificial Bee Colony coupling Support Vector Machine and a two-stage cascading classifier to identify relevant genes and detect cancer using RNA seq data. The proposed approach has been tested for seven different cancers, including Breast Cancer(BRCA), Stomach Cancer(STAD), Colon Cancer(COAD), Liver Cancer(LIHC), Lung Cancer(LUSC), Kidney Cancer(KIRC) and Skin Cancer(SKCM).

Last paper titled “Parallel Tensor Factorization for Relational Learning” is authored by Feras Al-Obeidat, Alvaro Rocha, Muhammad Shahrose Khan, Fahad Maqbool and Saad Razzaq. This article focuses on devising a parallel version for RESCAL (one of the famous tensor factorization technique that can solve large scale problems with relatively less time and space complexity). A decent decrease in execution time has been observed in the execution of parallel RESCAL.

After a thorough and rigorous review process, we believe that these articles constitute an interesting set of contributions addressing the theme of advanced ML techniques for large scale repositories. We would like to thank the authors who have submitted their work to this special issue, for their valuable insights and contributions. A special thanks to the reviewers who, with their commitment and expertise, have helped us to select the best articles and to improve their contents. We hope that readers find this special issue an interesting and valuable source of information to their own work. Finally, we would like to express our gratitude to the Editor-in-Chief of the Journal of Neural Computing and Applications, Professor John MacIntyre, for giving us the opportunity to organize this special issue and for his continuous support during all the process.