Computer Science > Databases
[Submitted on 7 Aug 2023]
Title:A General Framework for Progressive Data Compression and Retrieval
View PDFAbstract:In scientific simulations, observations, and experiments, the cost of transferring data to and from disk and across networks has become a significant bottleneck that particularly impacts subsequent data analysis and visualization. To address this challenge, compression techniques have been widely adopted. However, traditional lossy compression approaches often require setting error tolerances conservatively to respect the numerical sensitivities of a wide variety of post hoc data analyses, some of which may not even be known a priori. Progressive data compression and retrieval has emerged as a solution, allowing for the adaptive handling of compressed data according to the needs of a given post-processing task. However, few analysis algorithms natively support progressive data processing, and adapting compression techniques, file formats, client/server frameworks, and APIs to support progressivity can be challenging. This work presents a general framework that supports progressive-precision data queries independently of the underlying data compressor or number representation. Our approach is based on a multiple-component representation that successively, with each new component, reduces the error between the original and compressed field, allowing each field in the progressive sequence to be expressed as a partial sum of components. We have implemented our approach on top of four popular scientific data compressors and have evaluated its behavior on several real-world data sets from the SDRBench collection. Numerical results indicate that our framework is effective in terms of accuracy compared to each of the standalone compressors it builds upon. In addition, (de)compression time is proportional to the number and granularity of components. Finally, our framework allows for fully lossless compression using lossy compressors when a sufficient number of components are employed.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.