This repository contains the official implementation of CoSTI. CoSTI introduces a novel adaptation of Consistency Models (CMs) to the domain of Multivariate Time Series Imputation (MTSI), achieving significant reductions in inference (-98%) time while maintaining competitive imputation accuracy.
Multivariate Time Series Imputation (MTSI) is a critical task in various domains like healthcare and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, such as Denoising Diffusion Probabilistic Models (DDPMs), offer high imputation accuracy but suffer from high computational costs. CoSTI leverages Consistency Training to:
- Achieve comparable imputation quality to DDPMs.
- Drastically reduce inference times (up to 98% faster).
- Enable scalability for real-time applications.
For further details, please refer to our paper.
CoSTI combines concepts from Consistency Models and Multivariate Time Series Imputation to construct a framework optimized for speed and accuracy. The method includes:
- Spatio-Temporal Feature Extraction Modules (STFEMs): Extract spatio-temporal dependencies using transformers and Mamba blocks.
- Noise Estimation Modules (NEMs): Adapted for consistency models to predict Gaussian noise efficiently.
- Deterministic Imputation: Ensures robust and reproducible results by aggregating multiple imputations.
The following datasets are used for experiments:
- AQI-36: Air quality dataset with 36 sensors.
- METR-LA and PEMS-BAY: Traffic datasets covering Los Angeles and San Francisco.
- PhysioNet Challenge 2019: Clinical dataset for ICU patient monitoring.
The datasets employed in this study are publicly available and free to use. Specifically:
- The Torch SpatioTemporal library (Cini et al., 2022) provides tools to download and preprocess the AQI-36, METR-LA, and PEMS-BAY datasets. The PhysioNet Challenge 2019 dataset is accessible at https://physionet.org/content/challenge-2019/1.0.0/. Alternatively, we explain how to obtain this dataset in our repository using a script later on.
You can set up the required environment in two ways:
-
Using
requirements.txt
file:Install the required packages directly:
pip install -r requirements.txt
-
Using
setup.sh
:Build a Docker image and container for the project:
sudo chmod +x setup.sh ./setup.sh
This method creates a reproducible environment using Docker, simplifying dependency management.
If you want to download the PhysioNet Challenge 2019 dataset or obtain the pre-trained weights, you can run the following script:
chmod +x download_data.sh
./download_data.sh
Each experiment can be replicated using the provided configuration files. To execute a specific experiment, use the following command:
python ./scripts/<experiment_script>.py --config-name <experiment_file>
To replicate the training of CoSTI across five runs and obtain the average results, the following scripts can be executed for each dataset configuration:
python ./scripts/run_average_experiment.py --config-name aqi36
python ./scripts/run_average_experiment.py --config-name metr-la_point
python ./scripts/run_average_experiment.py --config-name metr-la_block
python ./scripts/run_average_experiment.py --config-name pems-bay_point
python ./scripts/run_average_experiment.py --config-name pems-bay_block
python ./scripts/run_average_experiment.py --config-name mimic-challenge
We provide pre-trained weights for each dataset to enable testing and result replication using 2-step sampling. For example, for the AQI-36 dataset, you can run:
python ./scripts/run_k_test_experiment.py --config-name aqi36 test_sigmas=[80]
python ./scripts/run_k_test_experiment.py --config-name aqi36
You can modify the noise levels (./config/k_test/aqi36.yaml
or with the parameter test_sigmas in the command line.
To replicate the sensitivity analysis, execute the following:
python ./scripts/run_sensitivity_experiment.py --config-name metr-la_point
To perform imputation using the provided weights:
python ./scripts/impute_data.py --config-name aqi36
python ./scripts/impute_data.py --config-name metr-la_point
python ./scripts/impute_data.py --config-name metr-la_block
python ./scripts/impute_data.py --config-name pems-bay_point
python ./scripts/impute_data.py --config-name pems-bay_block
python ./scripts/impute_data.py --config-name mimic-challenge
@article{solis2025costi,
title={CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation},
author={Sol{\'\i}s-Garc{\'\i}a, Javier and Vega-M{\'a}rquez, Bel{\'e}n and Nepomuceno, Juan A and Nepomuceno-Chamorro, Isabel A},
journal={arXiv preprint arXiv:2501.19364},
year={2025}
}