Quick Start

Step-by-step instructions for operators to deploy the controller, enable agents, configure storage backends with credentials, and work with Datasets and DatasetClaims. Complete examples using S3-compatible object storage are also included.

1. Install ZCDP

  1. Download the public resources

    Installation scripts and all CRDs are available in the deploy directory of the https://github.com/dataplatformsolutions/zero-copy-data-plane repository.

    Either clone the repository or download one of the release bundles at https://github.com/dataplatformsolutions/zero-copy-data-plane/releases

  2. Ensure kubectl is installed and configured

    Make sure your KUBECONFIG points at the target cluster where you want to install ZCDP.

  3. Run the installer

    Run the installer script to install the controller and webhook. It also installs cert-manager and the CRDs.

    deploy/install-all.sh
  4. Verify by running the health check in
    deploy/scripts/health-check.sh

Note: the docker image version can be configured at the top of the install-all.sh script.

ZCDP_CONTROLLER_IMAGE="ghcr.io/dataplatformsolutions/zcdp-controller:0.1.0"
ZCDP_AGENT_IMAGE="ghcr.io/dataplatformsolutions/zcdp-agent:0.1.0"
the namespaces on which ZCDP works can be configured by editing the webhook.yaml manifest, specifically:

    namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: In
          values:
            - test-workload
          

2. Enable Agents

Tag nodes where you want the ZCDP agent to run with the label zcdp.io/enable-agent=true. The agent is responsible for syncing datasets to local cache and managing dataset lifecycle on each node.

This can be done using the following command:

kubectl label nodes <node-name> zcdp.io/enable-agent=true

Once this is done you are all set to use ZCDP. Either follow the next steps to configure storage backend, datasets and dataset claims or skip ahead and run the complete example that ships in the public repo.

3. Complete S3-based walkthrough

This walkthrough demonstrates a minimal but functional deployment that syncs a public dataset from Amazon S3 and mounts it into a training job.

  1. Define object storage credentials.
    kubectl -n zcdp-system create secret generic s3-creds \
      --from-literal=AWS_ACCESS_KEY_ID=AKIA... \
      --from-literal=AWS_SECRET_ACCESS_KEY=... \
      --from-literal=AWS_REGION=us-west-2
  2. Define a S3 StorageBackend.
    cat <<'EOF' | kubectl apply -f -
    apiVersion: zcdp.io/v1alpha1
    kind: StorageBackend
    metadata:
      name: public-s3-data
    spec:
      type: s3
      auth:
        mode: accessKey
        accessKeyIdSecretRef:
          name: s3-creds
          key: AWS_ACCESS_KEY_ID
        secretAccessKeySecretRef:
          name: s3-creds
          key: AWS_SECRET_ACCESS_KEY
      s3:
        region: us-west-2
    EOF
  3. Define Dataset to be used
    cat <<'EOF' | kubectl apply -f -
    apiVersion: zcdp.io/v1alpha1
    kind: Dataset
    metadata:
      name: dataset-products-2017
    spec:
      source:
        storageBackendRef: public-s3-data
        path: datasets/products/2017
      cache:
        maxSize: 80Gi
    EOF
  4. Run a sample training workload using the ZCDP lightweight annotation.
    cat <<'EOF' | kubectl apply -f -
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: products-ls
      namespace: trainers
    spec:
      template:
        metadata:
          annotations:
            zcdp.io/datasets: "dataset-products-2017:/mnt/datasets/products"
        spec:
          restartPolicy: Never
          containers:
            - name: ls
              image: busybox:1.36
              command:
                - sh
                - -c
                - |
                  echo "Listing /mnt/datasets/products"
                  ls -lah /mnt/datasets/products
    EOF
  5. Monitor progress.
    kubectl get nodedatasets -n zcdp-system
    kubectl logs -n zcdp-system daemonset/zcdp-agent -f
    kubectl logs -n trainers job/products-ls

Once the NodeDataset transitions to Ready, subsequent pods on the same node start instantly because the data is already on the NVMe cache.

4. Complete Environment Example

The example/ folder in the dataplaformsolutions/zero-copy-data-plane repository spins up a local k3s cluster, MinIO, and a sample workload that reads a dataset through ZCDP.

Prerequisites

The scripts assume KUBECONFIG points at INSTALL_PATH/example/kubeconfig/kubeconfig.yaml after the cluster boots.

Ensure the scripts are executable:

chmod +x INSTALL_PATH/example/scripts/*

Run the scripts

# 0. set working dir as scripts.
cd example/scripts

# 1. Start the infrastructure (k3s + MinIO)
./01_start_infra.sh

# 2. Install ZCDP using the /deploy/install-all.sh script.
./02_install_zcdp.sh

# 3. Tag 2 nodes with zcdp.io/enable-agent=true so that they can execute workloads
./03_enable_agents.sh

# 4. Upload the sample datasets to MinIO
./04_seed_minio.sh

# 5. Apply storage backend and dataset manifests
./05_apply_dataset.sh

# 6. Build the job container, push it into k3s, and apply the workload manifest
./06_build_and_deploy_job.sh

# 7. Check the output from the sample job
./07_verify_workload.sh

A good way to learn how to deploy your own workloads is to look at script 05_apply_dataset.sh and the manifests that it applies. Use these as a starting point, but also give them to AI to help generate the variations you need for your jobs.

Observability

Port-forward the status page while the example is running:

kubectl -n zcdp-system port-forward svc/zcdp-status 8080:8080

Clean up

docker compose down -v