Data Handling
16,871 Followers
Recent papers in Data Handling
The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation... more
Real life datasets often suffer from the problem of class imbalance, which thwarts supervised learning process. In such data sets examples of positive (minority) class are significantly less than those of negative (majority) class leading... more
The ALTEA program is an international and multidisciplinary collaboration aimed at studying particle radiation in space environment and its effects on astronauts, in particular the anomalous perception of Light Flashes. This paper... more
Augmented Reality (AR) in the real world is still an unsolved task. Beyond toy systems there are no scalable solutions for outdoor systems that include augmented real-time graphics. The problems are multifold: rendering capacities,... more
We present the Convergence Processor, an innovative component that integrates a high performance 32- bit RISC core, a custom IP core optimised for header-processing and other blocks for specific communication interfaces required for the... more
In high-energy physics, with the search for ever smaller signals in ever larger data sets, it has become essential to extract a maximum of the available information from the data. Multivariate classification methods based on machine... more
The aims is to consider the political and ethical challenges involved in conducting ethnographic managerial/organisational behaviour research within the highly regulated health and social care context, in light of the emergence of more... more
Purpose -This paper aims to propose a solution for recommending digital library services based on data mining techniques (clustering and predictive classification). Design/methodology/approach -Data mining techniques are used to recommend... more
Complex event processing has become increasingly important in modern applications, ranging from supply chain management for RFID tracking to real-time intrusion detection. The goal is to extract patterns from such event streams in order... more
Data logging and data distribution will be two main tasks connected with data handling in ITER. Data logging refers to the recovery and ultimate storage of all data, independent of the data source. Data distribution is related, on the one... more
Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task... more
This paper introduces a hybrid approach, namely Hybrid Artificial Neural Network-Naive Bayes classifier, for two-class imbalanced datasets classification.
Predictive microbiology Model building Strategic research Enabling technology Value analysis Modelling food and other ecosystems Microbial persistence and recovery This paper considers the future of predictive microbiology by exploring... more
This paper outlines the process of adopting electronic data interchange into a business cycle and the issues encountered in the process. These issues are categorized into technical, financial, scheduling, and ethical areas. Real-life case... more
In this paper, we develop a stochastic programming model for an integrated forward/reverse logistics network design under uncertainty. First, an efficient deterministic mixed integer linear programming model is developed for integrated... more
Since the explicit collaboration of biological and physical scientists with archaeologists started in the late 1930s, the discourse on the nature of this collaboration has been intense. The question of the relative roles of the specialist... more
This article urges counseling psychology researchers to recognize and report how missing data are handled, because consumers of research cannot accurately interpret findings without knowing the amount and pattern of missing data or the... more
Many quality improvement (QI) programs including six sigma, design for six sigma, and kaizen require collection and analysis of data to solve quality problems. Due to advances in data collection systems and analysis tools, data mining... more
In 1997, we proposed the fuzzy-possibilistic c-means (FPCM) model and algorithm that generated both membership and typicality values when clustering unlabeled data. FPCM constrains the typicality values so that the sum over all data... more
The simulation of vehicle dynamics has a wide array of applications in the development of vehicle technologies. This study deals with the methodological aspect of the problem of assessing the validity of a simulation using double lane... more
Big data refers to data volumes in the range of exabytes (10 18 ) and beyond. Such volumes exceed the capacity of current on-line storage systems and processing systems. Data, information, and knowledge are being created and collected at... more
Gene expression profiling plays an important role in a broad range of areas in biology. The raw gene expression data, may contain missing values. It is an important preprocessing step to accurately estimate missing values in microarray... more
This work investigates the possibility of extending the linguistic notion «Explorative Data Analysis», aiming to experimentally investigate linguistic data in order to obtain useful information more exploratively, through a spatial... more
Purpose -The purpose of this paper is to propose a systematic and rigorous process of data collection and fieldwork in qualitative research using four empirical studies of customer interactions in new product development (NPD) as... more
Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears to be to embrace the complexity of the domain... more
1] A new software, PetroGraph, has been developed to visualize, elaborate, and model geochemical data for igneous petrology purposes. The software is able to plot data on several different diagrams, including a large number of... more
The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for... more
Lumped-conceptual rainfall-runoff models are a 'stock in trade' tool for those working in the water industry and allied fields. However, there are many alternative models available, and sometimes multiple implementations of the same... more
Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that... more
Due to future markets shifts as well as current market dynamics industries like the automotive industry OEMs have to be able to handle both exploration and exploitation simultaneously, a capability which is called 'organizational... more
Pin-point planetary lander Field programmable gate array a b s t r a c t
The energy density of the universe is dominated by dark energy and dark matter, two mysterious components which pose some of the most important questions in fundamental science today. Euclid is a high-precision survey mission designed to... more
The performance of industrial power system studies can be significantly improved in both speed and reliability by the application of a similar format for all standard studies. The major calculations and drafting work are performed using... more
SQL is the (more or less) standardised language that is used by the majority of commercial database management systems. However, it is seriously flawed, as has been documented in detail by Date, Darwen, Pascal, and others. One of the most... more
A foundational problem in kernel-based semi-supervised learning is the design of suitable kernels which can properly reflect the underlying data manifold. One of the most well-known semi-supervised kernel learning approaches is the... more
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the... more
We review second-and third-order multivariate calibration, based on the growing literature in this field, the variety of data being produced by modern instruments, and the proliferation of algorithms capable of dealing with higher-order... more
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the... more