Step-by-step instructions for operators to deploy the controller, enable agents, configure storage backends with credentials, and work with Datasets and DatasetClaims. Complete examples using S3-compatible object storage are also included.
Installation scripts and all CRDs are available in the deploy directory of the https://github.com/dataplatformsolutions/zero-copy-data-plane repository.
Either clone the repository or download one of the release bundles at https://github.com/dataplatformsolutions/zero-copy-data-plane/releases
Make sure your KUBECONFIG points at the target cluster where you want to install ZCDP.
Run the installer script to install the controller and webhook. It also installs cert-manager and the CRDs.
deploy/install-all.sh
deploy/scripts/health-check.sh
Note: the docker image version can be configured at the top of the install-all.sh script.
ZCDP_CONTROLLER_IMAGE="ghcr.io/dataplatformsolutions/zcdp-controller:0.1.0"
ZCDP_AGENT_IMAGE="ghcr.io/dataplatformsolutions/zcdp-agent:0.1.0"
the namespaces on which ZCDP works can be configured by editing the webhook.yaml manifest, specifically:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- test-workload
Tag nodes where you want the ZCDP agent to run with the label
zcdp.io/enable-agent=true. The agent is responsible for syncing datasets to local cache
and managing dataset lifecycle on each node.
This can be done using the following command:
kubectl label nodes <node-name> zcdp.io/enable-agent=true
Once this is done you are all set to use ZCDP. Either follow the next steps to configure storage backend, datasets and dataset claims or skip ahead and run the complete example that ships in the public repo.
This walkthrough demonstrates a minimal but functional deployment that syncs a public dataset from Amazon S3 and mounts it into a training job.
kubectl -n zcdp-system create secret generic s3-creds \
--from-literal=AWS_ACCESS_KEY_ID=AKIA... \
--from-literal=AWS_SECRET_ACCESS_KEY=... \
--from-literal=AWS_REGION=us-west-2
cat <<'EOF' | kubectl apply -f -
apiVersion: zcdp.io/v1alpha1
kind: StorageBackend
metadata:
name: public-s3-data
spec:
type: s3
auth:
mode: accessKey
accessKeyIdSecretRef:
name: s3-creds
key: AWS_ACCESS_KEY_ID
secretAccessKeySecretRef:
name: s3-creds
key: AWS_SECRET_ACCESS_KEY
s3:
region: us-west-2
EOF
cat <<'EOF' | kubectl apply -f -
apiVersion: zcdp.io/v1alpha1
kind: Dataset
metadata:
name: dataset-products-2017
spec:
source:
storageBackendRef: public-s3-data
path: datasets/products/2017
cache:
maxSize: 80Gi
EOF
cat <<'EOF' | kubectl apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: products-ls
namespace: trainers
spec:
template:
metadata:
annotations:
zcdp.io/datasets: "dataset-products-2017:/mnt/datasets/products"
spec:
restartPolicy: Never
containers:
- name: ls
image: busybox:1.36
command:
- sh
- -c
- |
echo "Listing /mnt/datasets/products"
ls -lah /mnt/datasets/products
EOF
kubectl get nodedatasets -n zcdp-system
kubectl logs -n zcdp-system daemonset/zcdp-agent -f
kubectl logs -n trainers job/products-ls
Once the NodeDataset transitions to Ready, subsequent pods on the same node start instantly because the data
is already on the NVMe cache.
The example/ folder in the dataplaformsolutions/zero-copy-data-plane repository spins up a local k3s
cluster, MinIO, and a sample workload that reads a dataset through ZCDP.
kubectl
The scripts assume KUBECONFIG points at
INSTALL_PATH/example/kubeconfig/kubeconfig.yaml after the cluster boots.
Ensure the scripts are executable:
chmod +x INSTALL_PATH/example/scripts/*
# 0. set working dir as scripts.
cd example/scripts
# 1. Start the infrastructure (k3s + MinIO)
./01_start_infra.sh
# 2. Install ZCDP using the /deploy/install-all.sh script.
./02_install_zcdp.sh
# 3. Tag 2 nodes with zcdp.io/enable-agent=true so that they can execute workloads
./03_enable_agents.sh
# 4. Upload the sample datasets to MinIO
./04_seed_minio.sh
# 5. Apply storage backend and dataset manifests
./05_apply_dataset.sh
# 6. Build the job container, push it into k3s, and apply the workload manifest
./06_build_and_deploy_job.sh
# 7. Check the output from the sample job
./07_verify_workload.sh
A good way to learn how to deploy your own workloads is to look at script 05_apply_dataset.sh and the manifests that it applies. Use these as a starting point, but also give them to AI to help generate the variations you need for your jobs.
Port-forward the status page while the example is running:
kubectl -n zcdp-system port-forward svc/zcdp-status 8080:8080
docker compose down -v