Zero Copy Data Plane (ZCDP)

A Kubernetes-native dataset runtime for AI / ML / Transform workloads.
ZCDP mounts datasets locally, enabling instant startup, higher GPU & CPU utilization with zero changes to your applications.

Get Started

Why Zero Copy Data Plane?

AI workloads waste time and C/GPU cycles repeatedly downloading data from S3/GCS/Azure Blob storage. Zero Copy Data Plane solves this by caching datasets directly on node-local storage, then exposing them to pods as simple read-only bind mounts — no FUSE, no custom filesystems, no application changes. Subsequent jobs or runs needing the same dataset are provisioned onto nodes with the data they need. Runs start instantly, with data served locally at full local disk speeds.

🚀 Instant Startup

Jobs start faster by avoiding repeated remote downloads.

🎯 Higher C/GPU Utilization

Higher utilization by keeping compute fed with local data.

🔌 Zero Application Changes

No SDKs or new APIs — data appears as a normal directory.

🧩 Kubernetes Native

Storage Backend, Dataset, DatasetClaim, NodeDataset Custom Resource Definitions & Controller + Node Agent services.

🔒 Immutable Snapshots

Utilise best practice with immutable datasets and ZCDP content cache.

⚙️ Simple to Operate

No distributed filesystem, no metadata cluster, no heavy caching layer.

How It Works

ZCDP uses a simple, but powerful architecture consisting of a controller, node agents, and a few Kubernetes Custom Resource Definitions.

ZCDP - Zero Copy Data Plane Architecture Object Storage (S3 / GCS / Azure Blob / MinIO) Datasets ZCDP Controller (Deployment in zcdp-system namespace) Watches Dataset / DatasetClaim NodeDataset CRDs Schedules syncs, tracks usage Exposes metrics URIs in Dataset spec Kubernetes Worker Nodes Node 1 ZCDP Agent (DaemonSet) Local storage Workload Pod /data from bind mount Node 2 ZCDP Agent (DaemonSet) Local storage Workload Pod /data from bind mount Sync & NodeDataset instructions Sync snapshot mount mount

ZCDP can then:

Use Cases

The Problem ZCDP Solves

Modern AI and data-intensive workloads on Kubernetes spend enormous time waiting for data, not computing. Typical patterns involve each job re-downloading the same large datasets and model files from S3/GCS/Azure Blob. Pods start slowly, GPUs sit idle, and object storage becomes a bottleneck.

Existing options like distributed filesystems and complex caching layers are often heavyweight, expensive, and require application changes or sidecars. Many teams roll their own partial solution and end up maintaining brittle, bespoke data loaders.

Zero Copy Data Plane exists to remove that bottleneck entirely.

Why Teams Choose ZCDP

When to use ZCDP

ZCDP is designed for workloads that reuse datasets across runs or nodes. For single-run, small datasets, benefits may be limited.

Architecture Overview

CRDs

  • Dataset — defines the source URI (e.g., S3 prefix) and snapshot version.
  • DatasetClaim — declares which pods should get a dataset mounted at what path.
  • NodeDataset — represents per-node state: whether a given snapshot is present, its size, and which pods are using it.

Controller

The controller watches these CRDs and:

  • Plans which nodes should host which snapshots
  • Consumes NodeDataset status reported by agents
  • Tracks pod usage via annotations and maintains per-node reference counts
  • Coordinates sync and eviction operations through the node agents
  • Exposes status and metrics for observability

Node Agent

A lightweight agent runs as a DaemonSet on each node, responsible for:

  • Syncing data from object storage (e.g., S3) into node-local NVMe
  • Validating snapshots and promoting them atomically into a ready state
  • Maintaining a local metadata index (size, last-access, status)
  • Recording NodeDataset status after ensures and evictions
  • Exposing stable, read-only directories to be bind-mounted into pods
  • Evicting unused snapshots when space is needed

Snapshots — Simple but Powerful

A snapshot in ZCDP is simply all the objects under a given prefix in your object store, for example: s3://my-bucket/imagenet/v3/

You do not need a special manifest or format. ZCDP treats the entire prefix as a single immutable snapshot. To create a new snapshot, simply upload a new set of files under a new prefix (e.g., v4/). ZCDP has built in CAS (content-addressable storage) capabilities to deduplicate shared files between snapshots, minimizing egress costs and local storage usage.

Content-Addressable Storage (CAS)

The node agent stores dataset files in a content-addressable cache keyed by checksums. This allows ZCDP to deduplicate shared assets across snapshots, resume interrupted syncs without re-downloading, and verify integrity before promoting a snapshot. The result is faster warm-up for common model weights, lower egress costs, and fewer storage writes on local NVMe.

Zero Copy Explained

"Zero copy" here means:

The agent writes data once to node-local NVMe, and ZCDP mounts that directory directly into pods using standard kernel bind mounts. Your application just sees a normal directory tree.

Get Started!

Like what you see? Get started with ZCDP today!

Check out the Quick Start Guide to deploy ZCDP in your Kubernetes cluster and run your first workload.