Papers by Swarup Dewanjee
27th International Conference on Computer and Information Technology (ICCIT), Dec 20, 2024
Parkinson’s Disease (PD) is a progressive neurodegenerative disorder that causes loss of motor fu... more Parkinson’s Disease (PD) is a progressive neurodegenerative disorder that causes loss of motor functions, emphasizing the need for prompt and accurate detection. This study aimed to design an automated system for early PD detection focusing on vocal features, as verbal disturbances often appear before more noticeable physical symptoms. A multisource dataset comprising speech recordings, biomedical voice metrics, and replicated voice recordings was utilized for this task. Several data augmentation techniques were explored to address data imbalances and size limitations, with Grouped Synthetic Minority Over-Sampling Technique (SMOTE) and Conditional Generative Adversarial Networks (CGAN) with Structured Conditioning were identified as the most effective. Among validated feature reduction methods, Boruta emerged as the optimal approach. This study focused on hybrid models that integrated Deep Learning (DL) and Machine Learning (ML), including attention mechanisms. Performance analysis identified the Attention-based Long Short-Term Memory - Bidirectional Gated Recurrent Unit (LSTM-BiGRU) as the superior model, attaining accuracies of 97.50%, 96.92%, and 96.25% across the datasets, respectively. Validation with K-fold and Leave-One-Participant-Out (LOPO) Cross-Validation (CV) ensured the model’s generalizability. Additionally, Explainable Boosting Machines (EBM) were employed to interpret the model’s decision-making process, enabling accurate and interpretable diagnostic tools for early PD detection. Keywords—Parkinson’s Disease, Speech and Vocal Features, LOPO CV, Boruta, Attention LSTM-BiGRU, EBM.
27th International Conference on Computer and Information Technology (ICCIT), Dec 20, 2024
Polycystic Ovary Syndrome (PCOS) is a chronic hormonal disorder affecting women of reproductive a... more Polycystic Ovary Syndrome (PCOS) is a chronic hormonal disorder affecting women of reproductive age. Its incurable nature and associated health complications emphasize the dire need for early detection. This study presents a systematic approach for early PCOS Detection (PCOSD) employing Machine Learning (ML) and Deep Learning (DL) models on a dataset of 541 patients. Following pre-processing, feature reduction was performed with Mutual Information (MI) emerging as the ideal technique, validated by Analysis of Variance (ANOVA) test, selecting 15 significant features. Synthetic data augmentation was conducted where Adaptive Synthetic Sampling (ADASYN) identified as the most effective technique based on Kolmogorov–Smirnov (KS) test, increasing the dataset size to 1520. Subsequently, the data was split into training, validation, and test sets, allowing the evaluation of diverse ML classifiers and DL architectures, alongside ensemble and hybrid models. After hyper parameter tuning and Cross-Validation (CV), followed by performance evaluation, a hybrid model was proposed. The framework integrated Bidirectional Long Short-Term Memory (BiLSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN). It demonstrated superiority over other models by securing 96.71% accuracy with 97% F1 score, indicating its efficacy in PCOSD. Consequently, Explainable Artificial Intelligence (XAI) techniques were availed for gaining deeper insights into model’s predictions. Keywords—Polycystic Ovary Syndrome, Mutual Information, ADASYN, BiLSTM-GRU-CNN, SHAP, LIME, ELI5.
4th International Conference on Innovations in Science, Engineering and Technology 2024 (ICISET 2024), Oct 27, 2024
Pregnancy health risk analysis is a critical concern as it impacts the well-being of both mother ... more Pregnancy health risk analysis is a critical concern as it impacts the well-being of both mother and the fetus. This task demands frequent assessments and prompt interventions to avoid clinical complexities. Despite medical advancements, challenges persist while monitoring health risk during pregnancies. This study aimed to develop an automated system to mitigate these risk factors by employing Deep Learning (DL) and Machine Learning (ML) approaches along with Hybridization and Ensemble Learning techniques. By leveraging both Maternal and Fetal health datasets, the system intended to aid in taking informed decisions for improving health outcomes. Data quality was ensured through preprocessing, addressing class imbalance and data size limitations with Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Generative Adversarial Network (cGAN). Thorough hyperparameter tuning, including inspection with various optimizers and rigorous evaluation processes was conducted. The generalizability was validated through crossdata analysis and cross-validation. A hybrid BiGRU-BiLSTM model was then proposed for demonstrating superiority with 96.21% and 97.38% accuracies on Maternal and Fetal datasets respectively. SHapley Additive exPlanations (SHAP) analysis was conducted to interpret model’s predictions and identify key features from existing datasets to create a merged dataset for adaptability evaluation. The proposed model yielded over 85% accuracy on this data. Subsequently, Local Interpretable Model Agnostic Explanations (LIME) analysis was availed to gain deeper insights into the concluded predictions. These findings highlight the proposed model’s potential applicability to enhance maternal and fetal health risk detection. Keywords—Maternal Fetal Health, Pregnancy Risks, cGAN, Explainable AI, Hybrid BiGRU-BiLSTM, LIME, SHAP.
15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Jun 24, 2024
Dyslexia is a neurological condition, that involves cognitive challenges in processing written la... more Dyslexia is a neurological condition, that involves cognitive challenges in processing written languages, impacting reading, writing and spelling abilities. Given that it affects over 10% of global population, early detection of dyslexia in children is crucial. This research proposes an automated system leveraging multiple modalities of data including Electroencephalography (EEG), Eye Tracking (ET) and Handwriting Text images (HTI) for enhanced generalizability. After rigorous preprocessing of data, clean data were used for feature extraction that led to data splitting. For broader understanding, this study employed both Machine Learning (ML) and Deep Learning (DL) architectures and analyses the outcome. A hybrid CNN-BiLSTM DL model was proposed for early detection of dyslexia as it showcased remarkable performance across all modalities, attaining accuracies of 96.57%, 97.62%, 95.81% on EEG, ET and HTI dataset respectively. The proposed model secured its superiority over traditional ML and DL models after thorough hyper parameter tuning, including assessment of various optimizers. The robustness was validated through cross validation with multiple-folds and inspection of Receiver Operating Characteristic (ROC) curves, Confusion Matrix and Accuracy-Loss curves were accomplished for meticulous evaluation. Furthermore, this research compared the proposed model against various Late fusion techniques and Ensemble Learning techniques, in which the proposed model held its consistent superiority. Consequently, the CNN-BiLSTM hybrid model was proposed as a tool for early detection of dyslexia.
International Journal of Advanced Computer Science and Applications, Feb 28, 2024
In the current digital landscape, social media's extensive user-generated content presents a uniq... more In the current digital landscape, social media's extensive user-generated content presents a unique opportunity for identifying emotional distress signals. With suicide rates on the rise, this study takes aid of Natural Language Processing (NLP) and Sentiment Analysis to detect suicide risk. Centering primarily around deep learning (DL) architectures, including Convolutional Neural Network (CNN), Bidirectional Gated Recurrent Unit (Bi-GRU) and their combined hybrid BiGRU-CNN model, the research incorporates machine learning (ML) for comparative analysis through multisource datasets from Reddit and Twitter. The methodology commenced with data preprocessing, followed by exploring word embedding techniques. This research included an analysis of both Word2Vec variants as well as pretrained GloVe embeddings, where Skip-Gram paired with Adam optimizer showed superior results. For thorough evaluation, Receiver Operating Characteristic (ROC) curves, Confusion Matrix and Accuracy-Loss graphs were utilized. Furthermore, generalizability of employed models was testified and evaluated by in-depth inspections. The process was accomplished by activating manual input test, cross dataset test and k-fold cross validation procedures. In the course of scrutinizing, the proposed BiGRU-CNN model outperformed the traditional DL and ML models with consistent and reliable performance. Correspondingly, the proposed model achieved accuracies of 93.07% and 92.47% on the respective datasets which advocate its potential as a tool for the early detection of suicidal thought.
International Journal of Advanced Computer Science and Applications(IJACSA), 2024
In the current digital landscape, social media’s
extensive user-generated content presents a uniq... more In the current digital landscape, social media’s
extensive user-generated content presents a unique opportunity
for identifying emotional distress signals. With suicide rates on
the rise, this study takes aid of Natural Language Processing
(NLP) and Sentiment Analysis to detect suicide risk. Centering
primarily around deep learning (DL) architectures, including
Convolutional Neural Network (CNN), Bidirectional Gated
Recurrent Unit (Bi-GRU) and their combined hybrid BiGRUCNN model, the research incorporates machine learning (ML)
for comparative analysis through multisource datasets from
Reddit and Twitter. The methodology commenced with data preprocessing, followed by exploring word embedding techniques.
This research included an analysis of both Word2Vec variants as
well as pretrained GloVe embeddings, where Skip-Gram paired
with Adam optimizer showed superior results. For thorough
evaluation, Receiver Operating Characteristic (ROC) curves,
Confusion Matrix and Accuracy-Loss graphs were utilized.
Furthermore, generalizability of employed models was testified
and evaluated by in-depth inspections. The process was
accomplished by activating manual input test, cross dataset test
and k-fold cross validation procedures. In the course of
scrutinizing, the proposed BiGRU-CNN model outperformed the
traditional DL and ML models with consistent and reliable
performance. Correspondingly, the proposed model achieved
accuracies of 93.07% and 92.47% on the respective datasets
which advocate its potential as a tool for the early detection of
suicidal thought.
Uploads
Papers by Swarup Dewanjee
extensive user-generated content presents a unique opportunity
for identifying emotional distress signals. With suicide rates on
the rise, this study takes aid of Natural Language Processing
(NLP) and Sentiment Analysis to detect suicide risk. Centering
primarily around deep learning (DL) architectures, including
Convolutional Neural Network (CNN), Bidirectional Gated
Recurrent Unit (Bi-GRU) and their combined hybrid BiGRUCNN model, the research incorporates machine learning (ML)
for comparative analysis through multisource datasets from
Reddit and Twitter. The methodology commenced with data preprocessing, followed by exploring word embedding techniques.
This research included an analysis of both Word2Vec variants as
well as pretrained GloVe embeddings, where Skip-Gram paired
with Adam optimizer showed superior results. For thorough
evaluation, Receiver Operating Characteristic (ROC) curves,
Confusion Matrix and Accuracy-Loss graphs were utilized.
Furthermore, generalizability of employed models was testified
and evaluated by in-depth inspections. The process was
accomplished by activating manual input test, cross dataset test
and k-fold cross validation procedures. In the course of
scrutinizing, the proposed BiGRU-CNN model outperformed the
traditional DL and ML models with consistent and reliable
performance. Correspondingly, the proposed model achieved
accuracies of 93.07% and 92.47% on the respective datasets
which advocate its potential as a tool for the early detection of
suicidal thought.
extensive user-generated content presents a unique opportunity
for identifying emotional distress signals. With suicide rates on
the rise, this study takes aid of Natural Language Processing
(NLP) and Sentiment Analysis to detect suicide risk. Centering
primarily around deep learning (DL) architectures, including
Convolutional Neural Network (CNN), Bidirectional Gated
Recurrent Unit (Bi-GRU) and their combined hybrid BiGRUCNN model, the research incorporates machine learning (ML)
for comparative analysis through multisource datasets from
Reddit and Twitter. The methodology commenced with data preprocessing, followed by exploring word embedding techniques.
This research included an analysis of both Word2Vec variants as
well as pretrained GloVe embeddings, where Skip-Gram paired
with Adam optimizer showed superior results. For thorough
evaluation, Receiver Operating Characteristic (ROC) curves,
Confusion Matrix and Accuracy-Loss graphs were utilized.
Furthermore, generalizability of employed models was testified
and evaluated by in-depth inspections. The process was
accomplished by activating manual input test, cross dataset test
and k-fold cross validation procedures. In the course of
scrutinizing, the proposed BiGRU-CNN model outperformed the
traditional DL and ML models with consistent and reliable
performance. Correspondingly, the proposed model achieved
accuracies of 93.07% and 92.47% on the respective datasets
which advocate its potential as a tool for the early detection of
suicidal thought.