0% found this document useful (0 votes)
207 views1 page

DVC Cheatsheet

DVC is a tool for data version control and reproducible machine learning workflows. It allows users to initialize a DVC environment, add files under DVC control, run commands to generate outputs, and reproduce or modify the pipeline by pulling and pushing data between a local cache and remote storage. Common commands include dvc init, dvc add, dvc run, dvc repro, dvc push, dvc pull and dvc status.

Uploaded by

Etienne Koa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views1 page

DVC Cheatsheet

DVC is a tool for data version control and reproducible machine learning workflows. It allows users to initialize a DVC environment, add files under DVC control, run commands to generate outputs, and reproduce or modify the pipeline by pulling and pushing data between a local cache and remote storage. Common commands include dvc init, dvc add, dvc run, dvc repro, dvc push, dvc pull and dvc status.

Uploaded by

Etienne Koa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Retrieve Data

Download files from the remote storage https://github.com/iterative/dvc


Cheat Sheet $ dvc pull
https://dvc.org/chat
Download files from a specific .dvc file

$ dvc pull filename.dvc


Basics
Initializing
Checkout files from cache into working space
Other Commands
$ dvc checkout
Initialize a DVC environment Set/unset cache directory location

$ dvc init The Pipeline $ dvc cache dir /path

Add transformations and generate a Commit outputs to cache


Remote
Set up a remote to keep and share data files stage file from a given command
$ dvc commit
$ dvc run -d dependencyfile \
$ dvc remote add -d myremote /path *Use if you specified --no-commit in dvc add/run/repro
-o outputfile python command.py
*Possible remotes include local, s3, gs, azure, ssh, hdfs Config repository or global options
and http. *Use --file to specify the name of the generated .dvc file.
*Use --metrics to output a file containing the metric. $ dvc config
Show all available remotes
*Config the default remote using core.remote myremote
$ dvc remote list Metrics
*Config core (loglevel, remote), cache and state settings
Collect and display project metrics
Modify remote settings Fetch files from the remote to the local cache
$ dvc metrics show
$ dvc remote modify myremote $ dvc fetch file.dvc
*Use --all to show the metrics in all branches.
*Use if remote requires extra configuration Remove unused objects from cache
Visualizing
Adding Files $ dvc gc
Show stages in a pipeline
Add files under DVC control
$ dvc pipeline show --ascii file.dvc Import file from URL to local directory
$ dvc add filename
*Add --commands or --outs to show more detail. $ dvc import url /path
*Use --no-commit to stop adding the file to the cache.
Show connected pipelines of DVC stages *Supported schemes include local, s3, gs, azure, ssh, hdfs
Share Data and http.
$ dvc pipeline list
Push all data files to the remote storage Remove data files tracked by dvc

$ dvc push Reproducing $ dvc remove filename.dvc


Reproduce outputs defined in .dvc file
Push outputs of a specific .dvc file Show changed stages in the pipeline
$ dvc repro filename.dvc
$ dvc push filename.dvc $ dvc status
*Name a .dvc file “Dvcfile” to be use by dvc repro by default

Made by Carl Handlin based on the documentation for DVC at https://dvc.org/doc

You might also like