arm-preprocessing

💡 Why arm-preprocessing? • ✨ Key features • 📦 Installation • 🚀 Usage • 🔗 Related frameworks • 📚 References • 🔑 License

arm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretisation for Association Rule Mining (ARM). 🧠 Embrace its minimalistic design that prioritises simplicity. 💡 The framework is intended to be fully extensible and offers seamless integration with related ARM libraries (e.g., NiaARM). 🔗

Free software: MIT license
Documentation: http://arm-preprocessing.readthedocs.io
Python: 3.9.x, 3.10.x, 3.11.x, 3.12x
Tested OS: Windows, Ubuntu, Fedora, Alpine, Arch, macOS. However, that does not mean it does not work on others

💡 Why arm-preprocessing?

While numerous libraries facilitate data mining preprocessing tasks, this library is designed to integrate seamlessly with association rule mining. It harmonises well with the NiaARM library, a robust numerical association rule mining framework. The primary aim is to bridge the gap between preprocessing and rule mining, simplifying the workflow/pipeline. Additionally, its design allows for the effortless incorporation of new preprocessing methods and fast benchmarking.

✨ Key features

Loading various formats of datasets (CSV, JSON, TXT, TCX) 📊
Converting datasets to different formats 🔄
Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text, etc.) 📉
Dataset identification (which type of dataset) 🔍
Dataset statistics 📈
Discretisation methods 📏
Data squashing methods 🤏
Feature scaling methods ⚖️
Feature selection methods 🎯

📦 Installation

pip

To install arm-preprocessing with pip, use:

pip install arm-preprocessing

To install arm-preprocessing on Alpine Linux, please use:

$ apk add py3-arm-preprocessing

To install arm-preprocessing on Arch Linux, please use an AUR helper:

$ yay -Syyu python-arm-preprocessing

🚀 Usage

Data loading

The following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the examples/data_loading directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('path/to/datasets', format='csv')

# Load dataset
dataset.load_data()
df = dataset.data

Missing values

The following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the examples/missing_values directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
dataset.load()

# Impute missing data
dataset.missing_values(method='impute')

Data discretisation

The following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the examples/discretisation directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load_data()

# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_width', num_bins=5, columns=['calories'])

Data squashing

The following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the examples/squashing directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
dataset.load()

# Squash dataset
dataset.squash(threshold=0.75, similarity='euclidean')

Feature scaling

The following example demonstrates how to scale the dataset's features. More examples can be found in the examples/scaling directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
dataset.load()

# Scale dataset using normalisation
dataset.scale(method='normalisation')

Feature selection

The following example demonstrates how to select features from a dataset. More examples can be found in the examples/feature_selection directory:

Select features using the Kendall Tau correlation coefficient

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load()

# Feature selection
dataset.feature_selection(
    method='kendall', threshold=0.15, class_column='calories')

🔗 Related frameworks

[1] NiaARM: A minimalistic framework for Numerical Association Rule Mining

[2] uARMSolver: universal Association Rule Mining Solver

📚 References

[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, Data squashing as preprocessing in association rule mining, 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.

[2] I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical association rule mining. arXiv preprint arXiv:2010.15524 (2020).

🔑 License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.github		.github
arm_preprocessing		arm_preprocessing
datasets		datasets
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

arm-preprocessing

💡 Why arm-preprocessing?

✨ Key features

📦 Installation

pip

🚀 Usage

Data loading

Missing values

Data discretisation

Data squashing

Feature scaling

Feature selection

🔗 Related frameworks

📚 References

🔑 License

Disclaimer

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

firefly-cpp/arm-preprocessing

Folders and files

Latest commit

History

Repository files navigation

arm-preprocessing

💡 Why arm-preprocessing?

✨ Key features

📦 Installation

pip

🚀 Usage

Data loading

Missing values

Data discretisation

Data squashing

Feature scaling

Feature selection

🔗 Related frameworks

📚 References

🔑 License

Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages