Welcome to the home of RobustiPy
, a library for the creation of a more robust and stable model space. RobustiPy does a large number of things, included but not limited to: high dimensional visualisation, Bayesian Model Averaging, bootstrapped resampling, (in- and)out-of-sample model evaluation, model selection via Information Criterion, explainable AI (via SHAP), and joint inference tests (as per Simonsohn et al. 2019). Kindly note: this project is in the early stages of development. Its functionally and API might change without notice! A working paper and a release are coming soon.
RobustiPy
performs Multiversal/Specification Curve Analysis. Multiversal/Specification Curve Analysis attempts to compute most or all reasonable specifications of a statistical model, understanding a specification as a single attempt to estimate an estimand of interest, whether through a particular choice of covariates, hyperparameters, data cleaning decisions, and so forth.
More formally, lets assume we have a general model of the form:
We are essentially attempting to model a dependent variable
RobustiPy
will then:
In words, it creates a set contaning the aritmentic mean of the elements of the powerset RobustiPy
then takes these specifications, fits them against observable (tabular) data, and produces coefficients and relevant metrics for each version of the predictor
To install directly (in Python
) from GitHub, run:
git clone https://github.com/RobustiPy/robustipy.git
cd robustipy
pip install .
In a Python script (or Jupyter Notebook), import the OLSRobust
class by running:
from robustipy.models import OLSRobust
model_robust = OLSRobust(y=y, x=x, data=data)
model_robust.fit(controls=c, # a list of control variables
draws=1000, # number of bootstrap resamples
kfold=10, # number of folds for OOS evaluation
seed=192735 # an optional but randomly chosen seed for consistent reproducibility
)
model_results = model_robust.get_results()
Where y
is a list of (string) variable names used to create your dependent variable, x
is your dependent (string) variable name of interest (which can be a list len>1), and c is a list of control (string) variable names predictors.
There are five empirical example notebooks here and five simulated examples scripts [here](here. The below is the output of a results.plot()
function call made on the canonical union dataset. Note: results.summary()
also prints out a large number of helpful statistics about your models!
We have a website made with jekkyl-theme-minimal that you can find here. It also contains details of a Hackathon we ran in 2024!
Please kindly see our guide for contributors file as well as our code of conduct. If you would like to become a formal project maintainer, please simply contact the team to discuss!
This work is free. You can redistribute it and/or modify it under the terms of the GNU GPL 3.0 license. The two dataset which is bundled with the library comes with it's own licensing conditions, and should be treatedly accordingly.
We are grateful to the extensive comments made by various academic communities over the course of our thinking about this work, not least the members of the ESRC Centre for Care and the Leverhulme Centre for Demographic Science.