A Kubernetes-native dataset runtime for AI / ML / Transform workloads.
ZCDP mounts datasets locally, enabling instant startup, higher GPU & CPU utilization with
zero changes to your applications.
AI workloads waste time and C/GPU cycles repeatedly downloading data from S3/GCS/Azure Blob storage. Zero Copy Data Plane solves this by caching datasets directly on node-local storage, then exposing them to pods as simple read-only bind mounts — no FUSE, no custom filesystems, no application changes. Subsequent jobs or runs needing the same dataset are provisioned onto nodes with the data they need. Runs start instantly, with data served locally at full local disk speeds.
Jobs start faster by avoiding repeated remote downloads.
Higher utilization by keeping compute fed with local data.
No SDKs or new APIs — data appears as a normal directory.
Storage Backend, Dataset, DatasetClaim, NodeDataset Custom Resource Definitions & Controller + Node Agent services.
Utilise best practice with immutable datasets and ZCDP content cache.
No distributed filesystem, no metadata cluster, no heavy caching layer.
ZCDP uses a simple, but powerful architecture consisting of a controller, node agents, and a few Kubernetes Custom Resource Definitions.
ZCDP can then:
/data/datasetModern AI and data-intensive workloads on Kubernetes spend enormous time waiting for data, not computing. Typical patterns involve each job re-downloading the same large datasets and model files from S3/GCS/Azure Blob. Pods start slowly, GPUs sit idle, and object storage becomes a bottleneck.
Existing options like distributed filesystems and complex caching layers are often heavyweight, expensive, and require application changes or sidecars. Many teams roll their own partial solution and end up maintaining brittle, bespoke data loaders.
Zero Copy Data Plane exists to remove that bottleneck entirely.
ZCDP is designed for workloads that reuse datasets across runs or nodes. For single-run, small datasets, benefits may be limited.
The controller watches these CRDs and:
NodeDataset status reported by agentsA lightweight agent runs as a DaemonSet on each node, responsible for:
NodeDataset status after ensures and evictionsA snapshot in ZCDP is simply all the objects under a given prefix in your object store, for example: s3://my-bucket/imagenet/v3/
You do not need a special manifest or format. ZCDP treats the entire prefix as a single
immutable snapshot. To create a new snapshot, simply upload a new set of files under a new prefix (e.g.,
v4/). ZCDP has built in CAS (content-addressable storage) capabilities to deduplicate shared files
between snapshots, minimizing egress costs and local storage usage.
The node agent stores dataset files in a content-addressable cache keyed by checksums. This allows ZCDP to deduplicate shared assets across snapshots, resume interrupted syncs without re-downloading, and verify integrity before promoting a snapshot. The result is faster warm-up for common model weights, lower egress costs, and fewer storage writes on local NVMe.
"Zero copy" here means:
The agent writes data once to node-local NVMe, and ZCDP mounts that directory directly into pods using standard kernel bind mounts. Your application just sees a normal directory tree.
Like what you see? Get started with ZCDP today!
Check out the Quick Start Guide to deploy ZCDP in your Kubernetes cluster and run your first workload.