Skip to main content

Durga Toshniwal

Followers

27

Following

1

Public Views

Ratul Kishore Saha

NORTH EASTERN HILL UNIVERSITY,SHILLONG

Shahid Chamran University - Ahwaz

Masoud Nikravesh

University of California, Berkeley

Fargana Abdullayeva

Mohammadjafar mohammadzadeh

Assiut University

Gholamreza kamali

Ursula Iturraran-Viveros

UNAM Universidad Nacional Autónoma de México

Interests

Uploads

Papers by Durga Toshniwal

Features denoising-based learning for porosity classification

Neural Computing and Applications

Reservoir characterization is one of the most challenging tasks that help in modeling different l... more Reservoir characterization is one of the most challenging tasks that help in modeling different lithological properties like porosity, permeability and fluid saturation using seismic readings like velocity profile, impedance, etc. Such a model is required for field development, placing new wells and prediction management. Seismic attributes are being progressively utilized for the tasks of model building, exploration and properties estimation from the data. However, these tasks become very complex due to the nonlinear and heterogeneous nature of subsurface properties. In this context, present work proposes a recurrent neural network-based learning framework to classify porosity using seismic attributes as predictor variables. The approach begins by calculating different seismic attributes from the data. From the initially calculated attribute set, features that are to be used for classification are selected by using two different strategies. Firstly, the seismic attributes having good correlation strength with reservoir porosity are extracted. Subsequently, generative topographic map is utilized to select the significant features. The final reduced features set obtained by the integrated result of above two strategies is then fed as an input to the empirical mode decomposition (EMD) algorithm. The denoised features resulting from the EMD algorithm are used to train the classification models. Further, a comparison is carried out between the proposed classification framework ðEMD þ RNNÞ and other supervised classifiers to show the performance of the proposed framework. Keywords Reservoir characterization Á Recurrent neural network Á Classification Á Petrophysical properties estimation List of symbols s t State at timestamp t W; U; V; W f ; W b ;W o Weight parameters b f ; b i ; b i Bias parameters x t Input at timestamp t o t Output at timestamp t c t Cell state at timestamp t h tÀ1 Output at previous timestamp r; softmax; ReLU Activation functions S(t) Input signal r Gradient m t Average of past gradients v t Average of past squared gradients b 1 ; b 2 ; h Parameters m t ; v t Biased corrected parameters g Learning rate q Correlation coefficient

CitEnergy : A BERT based model to analyse Citizens’ Energy-Tweets

Sustainable Cities and Society, 2022

Data Mining Techniques for Smart Mobility—A Survey

Advances in Intelligent Systems and Computing, 2018

With the rapid increase in urbanization, there is essence for making urban areas smart. There are... more With the rapid increase in urbanization, there is essence for making urban areas smart. There are assorted fields of computer science liable for development of a city to make it smart and intelligent. Some of these fields include data mining, sensor networks, Internet of things, web of things, cloud computing techniques and machine learning. Smart city is an umbrella term that encompasses smart mobility, smart governance, smart planning, smart environment and others. Smart mobility is one of the crucial aspects of smart city addressing efficient movement of people and goods from one place to another. This work is an extensive survey of research works related to application of data mining techniques for smart mobility. A comparative study of major works done in the aforementioned field is outlined in this paper.

PP-NFR: an improved hybrid learning approach for porosity prediction from seismic attributes using non-linear feature reduction

Journal of Applied Geophysics, 2019

Seismic attributes are being increasingly utilized in both reservoir characterization and explora... more Seismic attributes are being increasingly utilized in both reservoir characterization and exploration. In the Reservoir characterization, one might be interested in modeling different reservoir properties like water saturation, permeability, and porosity. The direct estimation of these reservoir properties from the seismic data is a challenging and time-consuming task. Therefore, it is desirable to build a model that uses seismic attributes to measure the d ifferent lithological properties of need. In the recent decades, various advancements have been done to estimate porosity from seismic attributes and well-logs effectively. However, the modeling of nonlinear porosity patterns is still underdeveloped for robust solutions as the existing methods implement linear or functional network based reduction techniques to estimate petrophysical properties. Therefore, in this study, we propose a nonlinear hybrid approach that integrates Generative Topographic Map with Machine Learning prediction models (ML) to estimate the property porosity. The approach initiates by extracting relevant attributes based on the two different feature selection strategies. Subsequently, the Radial Basis Function (RBF) and Random Forest regressors are used to build the training model for the target well property porosity using extracted attributes. The different combinations of wells are formed to show the variations in model training and testing error. To demonstrate the effectiveness of proposed PP-NFR approach, the prediction results of the proposed hybrid approach are compared with the linear prediction technique. Furthermore, the performance of PP-NFR approach is also evaluated by comparing the prediction results with the state-of-art prediction models (ANN and SVM). The prediction results

GenIt-e-MoTo: Generation of intelligence to envisage modus operandi of terror outfits

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 2016

Terrorism is on the rise in the Indian Sub-Continent and application of data mining techniques to... more Terrorism is on the rise in the Indian Sub-Continent and application of data mining techniques to analyse terrorist attacks is the need of the hour. The Security Forces (SF) in this region still rely on traditional and manual analysis methods. Data mining in this field is in its budding stage and if utilised efficiently will greatly facilitate the SF in preventing any terrorist attacks. SF are constantly searching for latest data mining techniques to augment terror analytics and improve protection of the local civilians and self thereby reducing collateral damage. Predicting terror attacks can push the potential of SF to the beat of terrorist activities. It is significant to recognise the spatial and temporal patterns for a better learning of terror incidents and to conceive their correlation. Clustering and Association rule mining (ARM) thus become strong contenders for efficient terror strikes forecasting. The above techniques can be used for a systematic profiling of outfits thus leading to the discovery of a unique pattern of operations i.e. Modus Operandi (MO) of a particular terror outfit. After gaining knowledge from data mining it is essential to convert it into actionable intelligence in order to be used by foot soldiers. Therefore, the paper provides concrete intelligence about various terror outfits operating in the most active Jammu and Kashmir (J&K) region of the Indian subcontinent.

Genome data classification based on fuzzy matching

CSI Transactions on ICT, 2012

Genomic data mining and knowledge extraction is an important problem in bioinformatics. Some rese... more Genomic data mining and knowledge extraction is an important problem in bioinformatics. Some research work has been done on unknown genome identification and is based on exact pattern matching of n-grams. In most of the real world biological problems exact matching may not give desired results and the problem in using n-grams is exponential explosion. In this paper we propose a method for genome data classification based on approximate matching. The algorithm works by selecting random samples from the genome database. Tolerance is allowed by generating candidates of varied length to query from these sample sequences. The Levenshtein distance is then checked for each candidate and whether they are k-fuzzily equal. The total number of fuzzy matches for each sequence is then calculated. This is then classified using the data mining techniques namely, naive Bayes, support vector machine, back propagation and also by nearest neighbor. Experiment results are provided for different tolerance levels and they show that accuracy increases as tolerance does. We also show the effect of sampling size on the classification accuracy and it was observed that classification accuracy increases with sampling size. Genome data of two species namely Yeast and E. coli are used to verify proposed method.

Automatic Recognition of Road Cracks Using Sobel Components in Digital Images

Lecture notes in civil engineering, Sep 19, 2022

A Novel Rough Set Based Clustering Approach for Streaming Data

Advances in Intelligent Systems and Computing, 2014

Clustering is a very important data mining task. Clustering of streaming data is very challenging... more Clustering is a very important data mining task. Clustering of streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in data over time. Inherent uncertainty involved in real world data stream further magnifies the challenge of working with streaming data. Rough set is a soft computing technique which can be used to deal with uncertainty involved in cluster analysis. In this paper, we propose a novel rough set based clustering method for streaming data. It describes a cluster as a pair of lower approximation and an upper approximation. Lower approximation comprises of the data objects that can be assigned with certainty to the respective cluster, whereas upper approximation contains those data objects whose belongingness to the various clusters in not crisp along with the elements of lower approximation. Uncertainty in assigning a data object to a cluster is captured by allowing overlapping in upper approximation. Proposed method generates soft-cluster. Keeping in view the challenges of streaming data, the proposed method is incremental and adaptive to evolving concept. Experimental results on synthetic and real world data sets show that our proposed approach outperforms Leader clustering algorithm in terms of classification accuracy. Proposed method generates more natural clusters as compare to k-means clustering and it is robust to outliers. Performance of proposed method is also analyzed in terms of correctness and accuracy of rough clustering.

Do Public and Government Think Similar About Indian Cleanliness Campaign?

Information Management and Big Data, 2019

With the growth of internet, social networks has become primary source for people to present thei... more With the growth of internet, social networks has become primary source for people to present their views on different topics. The data collected from social media are considered enough as well as reliable to be processed and gather insights on the perceptions of people towards any topic. In this research work, an empirical study of the Twitter data (i.e. around 400,000 tweets) collected for the period of December 1, 2017 to March 31, 2018, pertaining to Indian Cleanliness Campaign called Swachh Bharat Abhiyan (SBA), which focuses on improving the cleanliness situation in the country, has been done. Here, a demographic distribution of the Twitter data has been generated by augmenting partial keyword matching along with Named Entity Recognition for geoparsing the tweets. This will help to study the involvement of the people in different areas of the country. Furthermore, Sentiment Analysis of the tweets has been performed to gather the perception of people towards the campaign. Also, to assure the integrity of the campaign, the tweets have been segregated into public and government generated tweets and the respective sentiments have been compared to determine the difference in perception of public and government in different areas of the country. This work can be considered of interest because there has not been any research work which focuses on analyzing the awareness and perception of people on SBA in detail.

Attribute Selection Based on Correlation Analysis

Advances in Intelligent Systems and Computing, 2018

Feature selection is one of the significant areas of research in the field of data mining, patter... more Feature selection is one of the significant areas of research in the field of data mining, pattern recognition, and machine learning. One of the effective methods of feature selection is to determine the distinctive capability of the individual feature. More the distinctive capability the more interesting the feature is. But in addition to this, another important thing to be considered is the dependency between the different features. Highly dependent features lead to the inaccurate analysis or results. To solve this problem, we present an approach for feature selection based on the correlation analysis (ASCA) between the features. The algorithm works by iteratively removing the features that is highly dependent on each other. Firstly, we define the concept of multi-collinearity and its application to feature selection. Then, we present a new method for selection of attributes based on the correlation analysis. Finally, the proposed approach is tested on the benchmark datasets and the experimental results show that this approach works better than other existing feature selection algorithms both in terms of accuracy and computational overheads.

Spatio-Temporal Frequent Itemset Mining on Web Data

2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018

Web generates enormous volumes of spatiotemporal data every second. Such data includes transactio... more Web generates enormous volumes of spatiotemporal data every second. Such data includes transactional data on which association rule mining can be performed. Applications includes fraud detection, consumer purchase pattern identification, recommendation systems etc. Essence of spatiotemporal information alongwith the transactional data comes from the fact that the association rules or frequent patterns in the transactions are highly determined by the location and time of the occurrence of that transaction. For example, customer purchase of product depends upon the season and location of buying that product. To extract frequent patterns from such large databases, most existing algorithms demands enormous amounts of resources. The present work proposes a spatiotemporal association rule mining algorithm using hashing, to facilitate reduced memory access time and storage space. Hash based search technique is used to fasten the memory access by directly accessing the required spatio-temporal information from the schema. There are a numerous hash based search techniques that can be used. But to reduce collision, direct address hashing is focused upon primarily in this work. However, in future we plan to extend our results over different search techniques. Our results are compared with exiting Spatio-Temporal Apriori algorithm, which is one of the established association rule mining algorithm. Furthermore, experiments are demonstrated on several synthetically generated and web based datasets. Subsequently, a comparison over different datasets is given. Our algorithm shows improved results when evaluated over several metrics such as support of frequent itemsets and percentage gain in reduced memory access time. In future we plan to extend this work to various benchmark datasets.

Entropy Based Adaptive Outlier Detection Technique for Data Streams

Outlier detection in data streams is an immensely enthralling problem in many application areas s... more Outlier detection in data streams is an immensely enthralling problem in many application areas such as network intrusion detection, faulty sensor detection, fraud detection in online financial transactions etc. Majority of existing outlier detection techniques have been mainly designed for static datasets and require a global view and multiple scans of data which is not feasible in case of streaming data. In this paper, we propose an entropy based outlier detection technique for streaming data exploiting the fact that presence of an anomalous data object highly increases the entropy of normal data clustering. It maintains clusters of streaming data and finds change in its entropy on incoming data object. If increment in entropy is very large then the data object is marked as candidate outlier and its anomalous behaviour confirmed over multiple sliding windows to minimize the false alarms. The proposed method is incremental and dynamically updates clustering structure and entropy st...

Adaptive Outlier Detection in Streaming Time Series

Most existing outlier detection techniques work on static time sequences and process outliers by ... more Most existing outlier detection techniques work on static time sequences and process outliers by working on the entire time sequences to detect global outliers. This would be computationally infeasible in case of data streams as they are ordered sequences of data that arrives continuously. In this work, we aim to develop an adaptive algorithm that detects outliers from streaming time series data. The emphasis is on detecting the local outliers in addition to the global outliers. The HOT SAX algorithm has been extended to detect the local outliers. The algorithm is adaptive as it detects outliers on the basis of a reference set of outliers which keeps on getting updated dynamically depending upon the current data streams instances. The introduction of “outlierness” reflects the extent of abnormal behavior and the “type” refers to the type of deviation from normal behavior i.e. above normal or below normal. The proposed work has been evaluated on real life case datadaily vehicular tra...

SFA-GTM: Seismic Facies Analysis Based on Generative Topographic Map and RBF

Seismic facies identification plays a significant role in reservoir characterization. It helps in... more Seismic facies identification plays a significant role in reservoir characterization. It helps in identifying the various lithological and stratigraphical changes in reservoir properties. With the increase in the size of seismic data or number of attributes to be analyzed, the manual process for facies identification becomes complicated and time-consuming. Even though seismic attributes add multiple dimensions to the data, their role in reservoir characterization is very crucial. There exist different linear transformation methods that use seismic attributes for identification, characterization, and visualization of seismic facies. These linear transformation methods have been widely used for facies characterization. However, there are some limitations associated with these methods such as deciding the width parameters, number of clusters, convergence rate etc. Therefore, the present research work uses non-linear transformation approach that overcomes some of the major limitations o...

A System for Outlier Detection of High Dimensional Data

In high dimensional data large no of outliers are embedded in low dimensional subspaces known as ... more In high dimensional data large no of outliers are embedded in low dimensional subspaces known as projected outliers, but most of existing outlier detection techniques are unable to find these projected outliers, because these methods perform detection of abnormal patterns in full data space. So, outlier detection in high dimensional data becomes an important research problem. In this paper we are proposing an approach for outlier detection of high dimensional data. Here we are modifying the existing SPOT approach by adding three new concepts namely Adaption of Sparse Sub-Space Template (SST), Different combination of PCS parameters and set of non outlying cells for testing data set.

Real-time estimation of COVID-19 cases using machine learning and mathematical models - The case of India

2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020

COVID-19 pandemic has stressed out the economy and resources of major countries across the world ... more COVID-19 pandemic has stressed out the economy and resources of major countries across the world due to its high infection and transmission rate. The count of COVID-19 cases skyrocketed in the past few days, which creates immense pressure on health officials and governments. Therefore, prediction models to determine the number of new infections are urgently required in such grave times. In the present study, a machine learning technique, namely artificial neural network (ANN) is proposed to forecast the COVID-19 outbreak in India, for the first time. Moreover, in our study, we have additionally attempted to use a mathematical curve fitting model to ascertain the performance of the proposed ANN-based machine learning model. In addition, the impact of preventive measures such as lockdown and social distancing on the spread of COVID-19 is also analyzed by estimating the growth of the epidemic under different transmission rates. Moreover, a comparison between the proposed and existing COVID-19 prediction models is also demonstrated. Intriguingly, the proposed model is found to be highly accurate in estimating the growth of COVID-19 related parameters with the lowest MAPE values (cumulative confirmed cases (3.981), daily confirmed cases (4.173) and cumulative deceased cases (4.413)). Hence, the present study can assist the health officers and administration in getting prepared with the beforehand arrangement of the required resources and medical facilities.

Shunt connection: An intelligent skipping of contiguous blocks for optimizing MobileNet-V2

Neural Networks, 2019

Enabling deep neural networks for tight resource constraint environments like mobile phones and c... more Enabling deep neural networks for tight resource constraint environments like mobile phones and cameras is the current need. The existing availability in the form of optimized architectures like Squeeze Net, MobileNet etc., are devised to serve the purpose by utilizing the parameter friendly operations and architectures, such as point-wise convolution, bottleneck layer etc. This work focuses on optimizing the number of floating point operations involved in inference through an already compressed deep learning architecture. The optimization is performed by utilizing the advantage of residual connections in a macroscopic way. This paper proposes novel connection on top of the deep learning architecture whose idea is to locate the blocks of a pretrained network which have relatively lesser knowledge quotient and then bypassing those blocks by an intelligent skip connection, named here as Shunt connection. The proposed method helps in replacing the high computational blocks by computation friendly shunt connection. In a given architecture, up to two vulnerable locations are selected where 6 contiguous blocks are selected and skipped at the first location and 2 contiguous blocks are selected and skipped at the second location, leveraging 2 shunt connections. The proposed connection is used over state-of-the-art MobileNet-V2 architecture and manifests two cases, which lead from 33.5% reduction in flops (one connection) up to 43.6% reduction in flops (two connections) with minimal impact on accuracy.

A novel framework to analyze road accident time series data

Journal of Big Data, 2016

Background Road traffic accident (RTA) is one of the important concerns of research as it involve... more Background Road traffic accident (RTA) is one of the important concerns of research as it involves fatality, personal injuries that can lead to full or partial disability and property damage. A report [1] by World Health Organization (WHO) reveals that there are 1.2 million fatal and about four times injured road accidents every year across the world. Road and traffic safety is a term associated with road accidents. The primary focus of road safety is to provide some preventive measures that can be helpful in reducing RTAs. Road accident data analysis is an important factor that has been succeeds in identifying different factors associated with road accidents [2-5]. Once the associated road accident factors are identified the corresponding actions can be taken to overcome the accident rate and to apply some preventive measures. Road accident data analysis is mainly based on two categories: statistical techniques and data mining techniques. Various studies on

A data mining framework to analyze road accident data

Journal of Big Data, 2015

Background Road and traffic accidents are uncertain and unpredictable incidents and their analysi... more Background Road and traffic accidents are uncertain and unpredictable incidents and their analysis requires the knowledge of the factors affecting them. Road and traffic accidents are defined by a set of variables which are mostly of discrete nature. The major problem in the analysis of accident data is its heterogeneous nature [1]. Thus heterogeneity must be considered during analysis of the data otherwise, some relationship between the data may remain hidden. Although, researchers used segmentation of the data to reduce this heterogeneity using some measures such as expert knowledge, but there is no guarantee that this will lead to an optimal segmentation which consists of homogeneous groups of road accidents [2]. Therefore, cluster analysis can assist the segmentation of road accidents. Cluster analysis which is an important data mining technique can be used as a preliminary task to achieve various goals. Karlaftis and Tarko [3] used cluster analysis to categorize the accident data into different categories and further analyzed cluster results using Negative Binomial (NB) to identify the impact of driver age on road accidents.

Distributed Certification Authority for Mobile Ad Hoc Networks - A Dynamic Approach

Journal of Convergence Information Technology, 2007

A Mobile Ad Hoc Network (MANET) is an infrastructureless network of wireless mobile nodes that co... more A Mobile Ad Hoc Network (MANET) is an infrastructureless network of wireless mobile nodes that cooperate among each other to maintain connectivity of the network. In comparison to wired networks, securing a MANET is more difficult and challenging. One of the effective ways of providing security in MANETs is by using Public Key Cryptography and Certificates. Certificates are issued by

Features denoising-based learning for porosity classification

Neural Computing and Applications

Reservoir characterization is one of the most challenging tasks that help in modeling different l... more Reservoir characterization is one of the most challenging tasks that help in modeling different lithological properties like porosity, permeability and fluid saturation using seismic readings like velocity profile, impedance, etc. Such a model is required for field development, placing new wells and prediction management. Seismic attributes are being progressively utilized for the tasks of model building, exploration and properties estimation from the data. However, these tasks become very complex due to the nonlinear and heterogeneous nature of subsurface properties. In this context, present work proposes a recurrent neural network-based learning framework to classify porosity using seismic attributes as predictor variables. The approach begins by calculating different seismic attributes from the data. From the initially calculated attribute set, features that are to be used for classification are selected by using two different strategies. Firstly, the seismic attributes having good correlation strength with reservoir porosity are extracted. Subsequently, generative topographic map is utilized to select the significant features. The final reduced features set obtained by the integrated result of above two strategies is then fed as an input to the empirical mode decomposition (EMD) algorithm. The denoised features resulting from the EMD algorithm are used to train the classification models. Further, a comparison is carried out between the proposed classification framework ðEMD þ RNNÞ and other supervised classifiers to show the performance of the proposed framework. Keywords Reservoir characterization Á Recurrent neural network Á Classification Á Petrophysical properties estimation List of symbols s t State at timestamp t W; U; V; W f ; W b ;W o Weight parameters b f ; b i ; b i Bias parameters x t Input at timestamp t o t Output at timestamp t c t Cell state at timestamp t h tÀ1 Output at previous timestamp r; softmax; ReLU Activation functions S(t) Input signal r Gradient m t Average of past gradients v t Average of past squared gradients b 1 ; b 2 ; h Parameters m t ; v t Biased corrected parameters g Learning rate q Correlation coefficient

CitEnergy : A BERT based model to analyse Citizens’ Energy-Tweets

Sustainable Cities and Society, 2022

Data Mining Techniques for Smart Mobility—A Survey

Advances in Intelligent Systems and Computing, 2018

With the rapid increase in urbanization, there is essence for making urban areas smart. There are... more With the rapid increase in urbanization, there is essence for making urban areas smart. There are assorted fields of computer science liable for development of a city to make it smart and intelligent. Some of these fields include data mining, sensor networks, Internet of things, web of things, cloud computing techniques and machine learning. Smart city is an umbrella term that encompasses smart mobility, smart governance, smart planning, smart environment and others. Smart mobility is one of the crucial aspects of smart city addressing efficient movement of people and goods from one place to another. This work is an extensive survey of research works related to application of data mining techniques for smart mobility. A comparative study of major works done in the aforementioned field is outlined in this paper.

PP-NFR: an improved hybrid learning approach for porosity prediction from seismic attributes using non-linear feature reduction

Journal of Applied Geophysics, 2019

Seismic attributes are being increasingly utilized in both reservoir characterization and explora... more Seismic attributes are being increasingly utilized in both reservoir characterization and exploration. In the Reservoir characterization, one might be interested in modeling different reservoir properties like water saturation, permeability, and porosity. The direct estimation of these reservoir properties from the seismic data is a challenging and time-consuming task. Therefore, it is desirable to build a model that uses seismic attributes to measure the d ifferent lithological properties of need. In the recent decades, various advancements have been done to estimate porosity from seismic attributes and well-logs effectively. However, the modeling of nonlinear porosity patterns is still underdeveloped for robust solutions as the existing methods implement linear or functional network based reduction techniques to estimate petrophysical properties. Therefore, in this study, we propose a nonlinear hybrid approach that integrates Generative Topographic Map with Machine Learning prediction models (ML) to estimate the property porosity. The approach initiates by extracting relevant attributes based on the two different feature selection strategies. Subsequently, the Radial Basis Function (RBF) and Random Forest regressors are used to build the training model for the target well property porosity using extracted attributes. The different combinations of wells are formed to show the variations in model training and testing error. To demonstrate the effectiveness of proposed PP-NFR approach, the prediction results of the proposed hybrid approach are compared with the linear prediction technique. Furthermore, the performance of PP-NFR approach is also evaluated by comparing the prediction results with the state-of-art prediction models (ANN and SVM). The prediction results

GenIt-e-MoTo: Generation of intelligence to envisage modus operandi of terror outfits

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 2016

Terrorism is on the rise in the Indian Sub-Continent and application of data mining techniques to... more Terrorism is on the rise in the Indian Sub-Continent and application of data mining techniques to analyse terrorist attacks is the need of the hour. The Security Forces (SF) in this region still rely on traditional and manual analysis methods. Data mining in this field is in its budding stage and if utilised efficiently will greatly facilitate the SF in preventing any terrorist attacks. SF are constantly searching for latest data mining techniques to augment terror analytics and improve protection of the local civilians and self thereby reducing collateral damage. Predicting terror attacks can push the potential of SF to the beat of terrorist activities. It is significant to recognise the spatial and temporal patterns for a better learning of terror incidents and to conceive their correlation. Clustering and Association rule mining (ARM) thus become strong contenders for efficient terror strikes forecasting. The above techniques can be used for a systematic profiling of outfits thus leading to the discovery of a unique pattern of operations i.e. Modus Operandi (MO) of a particular terror outfit. After gaining knowledge from data mining it is essential to convert it into actionable intelligence in order to be used by foot soldiers. Therefore, the paper provides concrete intelligence about various terror outfits operating in the most active Jammu and Kashmir (J&K) region of the Indian subcontinent.

Genome data classification based on fuzzy matching

CSI Transactions on ICT, 2012

Genomic data mining and knowledge extraction is an important problem in bioinformatics. Some rese... more Genomic data mining and knowledge extraction is an important problem in bioinformatics. Some research work has been done on unknown genome identification and is based on exact pattern matching of n-grams. In most of the real world biological problems exact matching may not give desired results and the problem in using n-grams is exponential explosion. In this paper we propose a method for genome data classification based on approximate matching. The algorithm works by selecting random samples from the genome database. Tolerance is allowed by generating candidates of varied length to query from these sample sequences. The Levenshtein distance is then checked for each candidate and whether they are k-fuzzily equal. The total number of fuzzy matches for each sequence is then calculated. This is then classified using the data mining techniques namely, naive Bayes, support vector machine, back propagation and also by nearest neighbor. Experiment results are provided for different tolerance levels and they show that accuracy increases as tolerance does. We also show the effect of sampling size on the classification accuracy and it was observed that classification accuracy increases with sampling size. Genome data of two species namely Yeast and E. coli are used to verify proposed method.

Automatic Recognition of Road Cracks Using Sobel Components in Digital Images

Lecture notes in civil engineering, Sep 19, 2022

A Novel Rough Set Based Clustering Approach for Streaming Data

Advances in Intelligent Systems and Computing, 2014

Clustering is a very important data mining task. Clustering of streaming data is very challenging... more Clustering is a very important data mining task. Clustering of streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in data over time. Inherent uncertainty involved in real world data stream further magnifies the challenge of working with streaming data. Rough set is a soft computing technique which can be used to deal with uncertainty involved in cluster analysis. In this paper, we propose a novel rough set based clustering method for streaming data. It describes a cluster as a pair of lower approximation and an upper approximation. Lower approximation comprises of the data objects that can be assigned with certainty to the respective cluster, whereas upper approximation contains those data objects whose belongingness to the various clusters in not crisp along with the elements of lower approximation. Uncertainty in assigning a data object to a cluster is captured by allowing overlapping in upper approximation. Proposed method generates soft-cluster. Keeping in view the challenges of streaming data, the proposed method is incremental and adaptive to evolving concept. Experimental results on synthetic and real world data sets show that our proposed approach outperforms Leader clustering algorithm in terms of classification accuracy. Proposed method generates more natural clusters as compare to k-means clustering and it is robust to outliers. Performance of proposed method is also analyzed in terms of correctness and accuracy of rough clustering.

Do Public and Government Think Similar About Indian Cleanliness Campaign?

Information Management and Big Data, 2019

With the growth of internet, social networks has become primary source for people to present thei... more With the growth of internet, social networks has become primary source for people to present their views on different topics. The data collected from social media are considered enough as well as reliable to be processed and gather insights on the perceptions of people towards any topic. In this research work, an empirical study of the Twitter data (i.e. around 400,000 tweets) collected for the period of December 1, 2017 to March 31, 2018, pertaining to Indian Cleanliness Campaign called Swachh Bharat Abhiyan (SBA), which focuses on improving the cleanliness situation in the country, has been done. Here, a demographic distribution of the Twitter data has been generated by augmenting partial keyword matching along with Named Entity Recognition for geoparsing the tweets. This will help to study the involvement of the people in different areas of the country. Furthermore, Sentiment Analysis of the tweets has been performed to gather the perception of people towards the campaign. Also, to assure the integrity of the campaign, the tweets have been segregated into public and government generated tweets and the respective sentiments have been compared to determine the difference in perception of public and government in different areas of the country. This work can be considered of interest because there has not been any research work which focuses on analyzing the awareness and perception of people on SBA in detail.

Attribute Selection Based on Correlation Analysis

Advances in Intelligent Systems and Computing, 2018

Feature selection is one of the significant areas of research in the field of data mining, patter... more Feature selection is one of the significant areas of research in the field of data mining, pattern recognition, and machine learning. One of the effective methods of feature selection is to determine the distinctive capability of the individual feature. More the distinctive capability the more interesting the feature is. But in addition to this, another important thing to be considered is the dependency between the different features. Highly dependent features lead to the inaccurate analysis or results. To solve this problem, we present an approach for feature selection based on the correlation analysis (ASCA) between the features. The algorithm works by iteratively removing the features that is highly dependent on each other. Firstly, we define the concept of multi-collinearity and its application to feature selection. Then, we present a new method for selection of attributes based on the correlation analysis. Finally, the proposed approach is tested on the benchmark datasets and the experimental results show that this approach works better than other existing feature selection algorithms both in terms of accuracy and computational overheads.

Spatio-Temporal Frequent Itemset Mining on Web Data

2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018

Web generates enormous volumes of spatiotemporal data every second. Such data includes transactio... more Web generates enormous volumes of spatiotemporal data every second. Such data includes transactional data on which association rule mining can be performed. Applications includes fraud detection, consumer purchase pattern identification, recommendation systems etc. Essence of spatiotemporal information alongwith the transactional data comes from the fact that the association rules or frequent patterns in the transactions are highly determined by the location and time of the occurrence of that transaction. For example, customer purchase of product depends upon the season and location of buying that product. To extract frequent patterns from such large databases, most existing algorithms demands enormous amounts of resources. The present work proposes a spatiotemporal association rule mining algorithm using hashing, to facilitate reduced memory access time and storage space. Hash based search technique is used to fasten the memory access by directly accessing the required spatio-temporal information from the schema. There are a numerous hash based search techniques that can be used. But to reduce collision, direct address hashing is focused upon primarily in this work. However, in future we plan to extend our results over different search techniques. Our results are compared with exiting Spatio-Temporal Apriori algorithm, which is one of the established association rule mining algorithm. Furthermore, experiments are demonstrated on several synthetically generated and web based datasets. Subsequently, a comparison over different datasets is given. Our algorithm shows improved results when evaluated over several metrics such as support of frequent itemsets and percentage gain in reduced memory access time. In future we plan to extend this work to various benchmark datasets.

Entropy Based Adaptive Outlier Detection Technique for Data Streams

Outlier detection in data streams is an immensely enthralling problem in many application areas s... more Outlier detection in data streams is an immensely enthralling problem in many application areas such as network intrusion detection, faulty sensor detection, fraud detection in online financial transactions etc. Majority of existing outlier detection techniques have been mainly designed for static datasets and require a global view and multiple scans of data which is not feasible in case of streaming data. In this paper, we propose an entropy based outlier detection technique for streaming data exploiting the fact that presence of an anomalous data object highly increases the entropy of normal data clustering. It maintains clusters of streaming data and finds change in its entropy on incoming data object. If increment in entropy is very large then the data object is marked as candidate outlier and its anomalous behaviour confirmed over multiple sliding windows to minimize the false alarms. The proposed method is incremental and dynamically updates clustering structure and entropy st...

Adaptive Outlier Detection in Streaming Time Series

Most existing outlier detection techniques work on static time sequences and process outliers by ... more Most existing outlier detection techniques work on static time sequences and process outliers by working on the entire time sequences to detect global outliers. This would be computationally infeasible in case of data streams as they are ordered sequences of data that arrives continuously. In this work, we aim to develop an adaptive algorithm that detects outliers from streaming time series data. The emphasis is on detecting the local outliers in addition to the global outliers. The HOT SAX algorithm has been extended to detect the local outliers. The algorithm is adaptive as it detects outliers on the basis of a reference set of outliers which keeps on getting updated dynamically depending upon the current data streams instances. The introduction of “outlierness” reflects the extent of abnormal behavior and the “type” refers to the type of deviation from normal behavior i.e. above normal or below normal. The proposed work has been evaluated on real life case datadaily vehicular tra...

SFA-GTM: Seismic Facies Analysis Based on Generative Topographic Map and RBF

Seismic facies identification plays a significant role in reservoir characterization. It helps in... more Seismic facies identification plays a significant role in reservoir characterization. It helps in identifying the various lithological and stratigraphical changes in reservoir properties. With the increase in the size of seismic data or number of attributes to be analyzed, the manual process for facies identification becomes complicated and time-consuming. Even though seismic attributes add multiple dimensions to the data, their role in reservoir characterization is very crucial. There exist different linear transformation methods that use seismic attributes for identification, characterization, and visualization of seismic facies. These linear transformation methods have been widely used for facies characterization. However, there are some limitations associated with these methods such as deciding the width parameters, number of clusters, convergence rate etc. Therefore, the present research work uses non-linear transformation approach that overcomes some of the major limitations o...

A System for Outlier Detection of High Dimensional Data

In high dimensional data large no of outliers are embedded in low dimensional subspaces known as ... more In high dimensional data large no of outliers are embedded in low dimensional subspaces known as projected outliers, but most of existing outlier detection techniques are unable to find these projected outliers, because these methods perform detection of abnormal patterns in full data space. So, outlier detection in high dimensional data becomes an important research problem. In this paper we are proposing an approach for outlier detection of high dimensional data. Here we are modifying the existing SPOT approach by adding three new concepts namely Adaption of Sparse Sub-Space Template (SST), Different combination of PCS parameters and set of non outlying cells for testing data set.

Real-time estimation of COVID-19 cases using machine learning and mathematical models - The case of India

2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020

COVID-19 pandemic has stressed out the economy and resources of major countries across the world ... more COVID-19 pandemic has stressed out the economy and resources of major countries across the world due to its high infection and transmission rate. The count of COVID-19 cases skyrocketed in the past few days, which creates immense pressure on health officials and governments. Therefore, prediction models to determine the number of new infections are urgently required in such grave times. In the present study, a machine learning technique, namely artificial neural network (ANN) is proposed to forecast the COVID-19 outbreak in India, for the first time. Moreover, in our study, we have additionally attempted to use a mathematical curve fitting model to ascertain the performance of the proposed ANN-based machine learning model. In addition, the impact of preventive measures such as lockdown and social distancing on the spread of COVID-19 is also analyzed by estimating the growth of the epidemic under different transmission rates. Moreover, a comparison between the proposed and existing COVID-19 prediction models is also demonstrated. Intriguingly, the proposed model is found to be highly accurate in estimating the growth of COVID-19 related parameters with the lowest MAPE values (cumulative confirmed cases (3.981), daily confirmed cases (4.173) and cumulative deceased cases (4.413)). Hence, the present study can assist the health officers and administration in getting prepared with the beforehand arrangement of the required resources and medical facilities.

Shunt connection: An intelligent skipping of contiguous blocks for optimizing MobileNet-V2

Neural Networks, 2019

Enabling deep neural networks for tight resource constraint environments like mobile phones and c... more Enabling deep neural networks for tight resource constraint environments like mobile phones and cameras is the current need. The existing availability in the form of optimized architectures like Squeeze Net, MobileNet etc., are devised to serve the purpose by utilizing the parameter friendly operations and architectures, such as point-wise convolution, bottleneck layer etc. This work focuses on optimizing the number of floating point operations involved in inference through an already compressed deep learning architecture. The optimization is performed by utilizing the advantage of residual connections in a macroscopic way. This paper proposes novel connection on top of the deep learning architecture whose idea is to locate the blocks of a pretrained network which have relatively lesser knowledge quotient and then bypassing those blocks by an intelligent skip connection, named here as Shunt connection. The proposed method helps in replacing the high computational blocks by computation friendly shunt connection. In a given architecture, up to two vulnerable locations are selected where 6 contiguous blocks are selected and skipped at the first location and 2 contiguous blocks are selected and skipped at the second location, leveraging 2 shunt connections. The proposed connection is used over state-of-the-art MobileNet-V2 architecture and manifests two cases, which lead from 33.5% reduction in flops (one connection) up to 43.6% reduction in flops (two connections) with minimal impact on accuracy.

A novel framework to analyze road accident time series data

Journal of Big Data, 2016

Background Road traffic accident (RTA) is one of the important concerns of research as it involve... more Background Road traffic accident (RTA) is one of the important concerns of research as it involves fatality, personal injuries that can lead to full or partial disability and property damage. A report [1] by World Health Organization (WHO) reveals that there are 1.2 million fatal and about four times injured road accidents every year across the world. Road and traffic safety is a term associated with road accidents. The primary focus of road safety is to provide some preventive measures that can be helpful in reducing RTAs. Road accident data analysis is an important factor that has been succeeds in identifying different factors associated with road accidents [2-5]. Once the associated road accident factors are identified the corresponding actions can be taken to overcome the accident rate and to apply some preventive measures. Road accident data analysis is mainly based on two categories: statistical techniques and data mining techniques. Various studies on

A data mining framework to analyze road accident data

Journal of Big Data, 2015

Background Road and traffic accidents are uncertain and unpredictable incidents and their analysi... more Background Road and traffic accidents are uncertain and unpredictable incidents and their analysis requires the knowledge of the factors affecting them. Road and traffic accidents are defined by a set of variables which are mostly of discrete nature. The major problem in the analysis of accident data is its heterogeneous nature [1]. Thus heterogeneity must be considered during analysis of the data otherwise, some relationship between the data may remain hidden. Although, researchers used segmentation of the data to reduce this heterogeneity using some measures such as expert knowledge, but there is no guarantee that this will lead to an optimal segmentation which consists of homogeneous groups of road accidents [2]. Therefore, cluster analysis can assist the segmentation of road accidents. Cluster analysis which is an important data mining technique can be used as a preliminary task to achieve various goals. Karlaftis and Tarko [3] used cluster analysis to categorize the accident data into different categories and further analyzed cluster results using Negative Binomial (NB) to identify the impact of driver age on road accidents.

Distributed Certification Authority for Mobile Ad Hoc Networks - A Dynamic Approach

Journal of Convergence Information Technology, 2007

A Mobile Ad Hoc Network (MANET) is an infrastructureless network of wireless mobile nodes that co... more A Mobile Ad Hoc Network (MANET) is an infrastructureless network of wireless mobile nodes that cooperate among each other to maintain connectivity of the network. In comparison to wired networks, securing a MANET is more difficult and challenging. One of the effective ways of providing security in MANETs is by using Public Key Cryptography and Certificates. Certificates are issued by