SOCI, buildkit and containerd

This article is part of the Building streamable containers in EKS series.

Background

Managing container images and CI/CD pipelines is par for the course when you’re working with Kubernetes infrastructure. But when machine learning enters the mix, you not only have to deal with long build times for massive container images, but also long pull times when scaling out vertically.
In this post, I’ll share some strategies for tackling both build and pull times using Docker Build and Seekable OCI.

Heres a quick overview of key technologies we’ll use:

Docker build

Docker build is a client-server architecture with Buildx Docker CLI plugin being the client and user interface and BuildKit - the server or builder that will actually be handling build execution.

Both Buildx and BuildKit are installed with Docker Desktop and Docker Engine out-of-the-box.

BuildKit workers

BuildKit daemons (workers) come in 2 flavors: containerd and oci.

  • containerd workers rely on containerd runtimer to manage containers and images. Containerd needs to be up-and-running on the host.
  • oci workers manages containers and images themselves, and containerd is not needed.

Buildx drivers

Buildx additionally implements several “build drivers” - configurations for how and where the BuildKit backend runs. We will be using the remote build driver, which allows Buildx to connect to a manually managed BuildKit daemon.

Seekable OCI (SOCI)

Seekable OCI is an AWS-developed technology for lazy-loading of container images. Instead of pulling the entire image at once before launching a container, Seekable OCI lets you pull only the necessary layers for fastest possible launch time and also prefetch data in the background.

SOCI avoids having to modify existing images by building a separate index artifact (the “SOCI index”), which lives in the remote registry, right next to the image itself.

Concrete implementation of this is the containerd SOCI snapshotter plugin.

Goals

Below is a very high-level architecture diagram of what our stack will look like:

    block-beta
    columns 3
    cluster["kubernetes cluster"]:3
    node0["cluster node"]
    node1["cluster node"]
    node2["cluster node"]
    block:cri0=
        ctrd0["containerd"]
        soci0["soci-snapshotter"]
    end
    block:cri1
        ctrd1["containerd"]
        soci1["soci-snapshotter"]
    end
    block:cri2
        ctrd2["containerd"]
        soci2["soci-snapshotter"]
    end
    ss("builder stateful set"):2
    ow("other workloads")
    pod0("builder stateful set pod")
    pod1("builder stateful set pod")
    pod2("other workload pods")
    block:podcontent0
        cctrd0["pod containerd"]
        csoci0["pod soci-snapshotter"]
        buildkit0["buildkit"]
    end
    block:podcontent1
        cctrd1["pod containerd"]
        csoci1["pod soci-snapshotter"]
        buildkit1["buildkit"]
    end
    space 

    classDef Cluster fill:#997,stroke:#333;
    classDef Nodes fill:#999,stroke:#333;
    classDef Containerd fill:#98B,stroke:#333;
    classDef Soci fill:#98D,stroke:#333;
    classDef Buildkit fill:#98E,stroke:#333;
    classDef Workload fill:#99C,stroke:#333;
    classDef WorkloadPods fill:#99D,stroke:#333;
    classDef Statefulset fill:#F99,stroke:#333;
    classDef StatefulsetPods fill:#D99,stroke:#333;

    class cluster,node0,node1,node2 Cluster
    class node0,node1,node2 Nodes
    class ctrd0,ctrd1,ctrd2,cctrd0,cctrd1 Containerd
    class soci0,soci1,soci2,csoci0,csoci1 Soci
    class ow Workload
    class pod2 WorkloadPods
    class ss Statefulset
    class pod0,pod1 StatefulsetPods
    class buildkit0,buildkit1 Buildkit

BuildKit worker StatefulSet

This article series will show you how to create a Kubernetes StatefulSet for BuildKit workers. Each StatefulSet pod will run a BuildKit daemon with containerd worker configuration and a containerd daemon. It will also attach a persistent volume for BuildKit cache and containerd image store.

BuildKit worker config

Using BuildKit containerd worker (rather than oci) allows for:

  1. Lazy-pulling base images (if base images have available SOCI indexes) with SOCI when building
  2. Avoiding re-pull of build results when generating SOCI indexes. Instead, index generation is setup to run on the same pod that built the image, thus using cached results from containerd image store.
    sequenceDiagram
    actor user
    participant cli
    box builder pod
    participant buildkit as containerized buildkit daemon
    participant containerd as containerized containerd
    participant nerdctl
    end
    participant registry as container registry
    user->>cli: Build target
    cli->>buildkit: Send build context to buildkit

    loop build image layers
        buildkit->>containerd: Build layer
        containerd->>containerd: Lazy-pull base
        containerd->>buildkit: Result
    end
    buildkit->>registry: push image
    buildkit->>cli: built image tag/failure
    cli->>nerdctl: build SOCI index for tag
    nerdctl->>containerd: load container image from local store
    nerdctl->>nerdctl: build SOCI index
    nerdctl->>registry: push SOCI index
    nerdctl->>cli: success/failure
    cli->>user: Success/failure
    loop cache garbage collection
        buildkit->>buildkit: Garbage collect cache
        buildkit->>containerd: delete GC-ed images
    end

EKS node configuration

EKS nodes will need to be configured to take advantage of images with available SOCI index. This article series will show you how to do so with EC2 user-data, with examples for EC2NodeClass (if using karpenter for scheduling) and terraform aws_launch_template

More