Skip to content

The pipeline

The pipeline, implemented with DVC and stored in the dvc.yaml file, defines the processes that produce the project's important artifacts.

The command, or cmd key, for each stage should typically include the calkit xenv command, such that every process is executed in a defined environment. This way, Calkit will ensure the environment matches its specification before execution.

If a stage has a single output, it is also possible to define what type of Calkit object it produces. For example, the stage below collect-data produces a dataset. When executing calkit run this will automatically be added to the datasets list in the calkit.yaml file, allowing other users to search for and use this dataset in their own work.

stages:
  collect-data:
    cmd: calkit xenv -n main -- python scripts/collect-data.py
    deps:
      - scripts/collect-data.py
    outs:
      - data/raw.csv
    meta:
      calkit:
        type: dataset
        title: The raw data
        description: Raw voltage, collected from the sensor.

To learn more, see DVC's pipeline documentation.