Good Kyverno Admission Control Patterns

Kyverno is a Kubernetes admission controller used to add policies to your cluster. Admission controllers intercept incoming requests to the Kubernetes api server and check if a field matches a regular expression, then approve or deny the request based on that determination. If you’re not familiar with admission controllers already, I’d recommend reading the official documentation.

If you’re in the process of deciding which admisson controller to use in your environment, consider the background of the team that will be maintaining the policies. Do they all know how to program in Golang? If so, OPA Gatekeeper may be a better choice for you. I personally believe that Kyverno is a better admission controller because the barrier for reading and writing policies is significantly lower than OPA. I haven’t run into a scenario where I need more verbose or complex syntax beyond what ships in Kyverno by default.

Summary

Have some way to whitelist resources
Make all your transformations with JSON patches
Narrow down the scope of your whitelists as much as possible
Configure all policies to accept an array of rules rather than a single ruleset
Have some way to unit test against admission controller policies when working with k8s manifests in a github repository
Separate your Kyverno installation from the underlying policies
Have some way to toggle rule actions between audit and enforcement
Add remote policies using raw.githubusercontent.com rather than copying them locally
When referencing remote policies, target a commit hash or branch version rather than the main branch’s head
Avoid mutating resources with policies when possible

Example Boilerplate

I created this example github repository that illustrates all of the principles in this article https://github.com/salineselin/kyverno-example

Recommendations

The following sections expand on the gists described earlier

Have some way to whitelist resources

It is inevitable that you will have resources in your kubernetes clusters that violate policies. Defining some generic templatized process around how you add exceptions to a whitelist is crucial. You can whitelist resources by using the exclude key like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-capabilities
spec:
  validationFailureAction: audit
  background: true
  rules:
    - name: adding-capabilities
      exclude:
        any:
          - resources:
              kinds:
                - Pod
              selector:
                matchLabels:
                  app: managed-prometheus-collector
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: >-
          Any capabilities added beyond the allowed list (AUDIT_WRITE, CHOWN, DAC_OVERRIDE, FOWNER,
          FSETID, KILL, MKNOD, NET_BIND_SERVICE, SETFCAP, SETGID, SETPCAP, SETUID, SYS_CHROOT)
          are disallowed.          
        deny:
          conditions:
            all:
              - key: "{{ request.object.spec.[ephemeralContainers, initContainers, containers][].securityContext.capabilities.add[] }}"
                operator: AnyNotIn
                value:
                  - SYS_CHROOT

Make all your changes with JSON patches

This applies if you’re using Kustomize to manage your Kyverno transformations. Kustomize overlays are excellent at implicitly overlaying all the necessary parameters, but when you start working with array indexes more, you start wiping data that you don’t intend to and you usually end up repeating yourself a lot. If you make all your transformations with JSON patches rather than overlays, you have a complete list of all your transformations, and debugging those transformations becomes a lot easier when kustomize executes and can explicitly point out a faulty JSON patch.

Narrow down the scope of your whitelists as much as possible

Whitelisting a namespace is a very primitive control for adding exceptions for entities. It’s fast and easy to understand, but is grossly overpermissive. Unless you have your RBAC hardened to a point where clusters users don’t have visibility into what the policy exceptions are, its trivial for an attacker to just use a different namespace that’s been whitelisted.

Configure all policies to accept an array of rules rather than a single ruleset

There are multiple valid syntaxes when defining a policy. You can match according to one object or an array of objects. The preferred method is an array of objects. It’s not mentioned in Kyverno’s documentation, but if you use an any match selector, you also have to use the any selector when creating exclusions. To demonstrate why you want to only work with an array of matches rather than a single defined match, let’s work with the following policy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-capabilities
spec:
  validationFailureAction: audit
  background: true
  rules:
    - name: adding-capabilities
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: >-
          Any capabilities added beyond the allowed list (AUDIT_WRITE, CHOWN, DAC_OVERRIDE, FOWNER,
          FSETID, KILL, MKNOD, NET_BIND_SERVICE, SETFCAP, SETGID, SETPCAP, SETUID, SYS_CHROOT)
          are disallowed.          
        deny:
          conditions:
            all:
              - key: "{{ request.object.spec.[ephemeralContainers, initContainers, containers][].securityContext.capabilities.add[] }}"
                operator: AnyNotIn
                value:
                  - SYS_CHROOT

This policy which you can find in source here is perfectly valid. The problem is when you want to whitelist a particular namespace or add another ruleset with more advanced targeting, you have to heavily modify the underlying ruleset with JSON patches.

Let’s say I wanted to add two exceptions according to a label selector. Instead of a single JSON patch that looks something like the following, you would have to add multiple patches to get your policy into your defined state. Transformations are necessary, but excessive transformations mean you have more code to maintain, and legibility is decreased.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
- op: add
  path: /spec/rules/0/exclude
  value:
    any:
      - resources:
          kinds:
            - Pod
          selector:
            matchLabels:
              # gke's managed prometheus uses the host ports
              app: managed-prometheus-collector
      - resources:
          kinds:
            - Pod
          selector:
            matchLabels:
              # gke's managed prometheus rule evaluator needs them as well
              app: rule-evaluator

This is the desirable syntax

1
2
3
4
5
match:
  any:
    - resources:
        kinds:
          - Pod

And this is the less-than-desirable syntax

1
2
3
4
match:
  resources:
    kinds:
      - Pod

Almost all of the policies you find today in the public Kyverno policies repository now use the array syntax by default, but it wasn’t always that way. In this earlier commit you can find remnants of when the configured rules did not use the array syntax.

Have some way to unit test against admission controller policies when working with k8s manifests in a github repository

Debugging CI sucks. If your CI|CD pipeline is verbose and takes five minutes to deploy something to a cluster, but it fails due to a failed policy check, it chisels at your soul. Feedback loops become repetitive, slow, and unfruitful. If you’re using a CI provider like Github Actions, make a pipeline that runs a unit test using kubectl apply -f /path/to/yaml --dry-run=server. If this method is unpalatable for your use-case, it’s worth looking into tools like Datree that offer admission controller unit testing. I personally don’t use Datree because until they ship their own admission controller, there will always be a lack of parity between the policies Datree provides and the policies you provide in Kyverno.

If that isn’t soon enough and you’re still experiencing pain with iteration, you could potentially use a git hook when you push source control changes to remote (similar to how husky does it) to get that feedback even sooner. I’d recommend only implementing a hook like this if the developers working on kubernetes manifests are acclimated to kubernetes and have their policies pulled down into their local dev cluster, or they’re authenticated into a remote cluster with a context they can use to --dry-run=server against.

Most of the manifests you write are likely written and then rarely touched, so iteration slow and frequent enough to cause admission controller heartache is likely seldom.

Separate your Kyverno installation from the underlying policies

Respect that the policies you install are separate from the application that enforces the policies. They have divergent lifecycles, but should be respected as harmonious companions

Have some way to toggle rule actions between audit and enforcement

If you’re working with remote policies a lot, most of them are usually set to audit rather than enforce, so you’ll need to make a transformation that changes the validationFailureAction value. I use this JSON patch:

1
2
3
- op: replace
  path: /spec/validationFailureAction
  value: enforce

Add remote policies using raw.githubusercontent.com rather than copying them locally

Kustomize allows you to target local as well as remote resources too! Make use of the Kyverno/policies repository and use those instead of rewriting what’s likely already been written. An example kustomize manifest that uses remote resources looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
  app: kyverno
resources:
  # remote policies
  - https://raw.githubusercontent.com/kyverno/policies/release-1.7/pod-security/baseline/disallow-capabilities/disallow-capabilities.yaml
  - https://raw.githubusercontent.com/kyverno/policies/release-1.7/pod-security/baseline/disallow-host-namespaces/disallow-host-namespaces.yaml
  - https://raw.githubusercontent.com/kyverno/policies/release-1.7/pod-security/baseline/disallow-host-ports/disallow-host-ports.yaml
  - https://raw.githubusercontent.com/kyverno/policies/097ca254c5d52d05bf1aa385d140e8743b9f21ba/best-practices/disallow_default_namespace/disallow_default_namespace.yaml

When referencing remote policies, target a commit hash or branch version rather than the main branch’s head

In the previous example, you can see that explicit commit hashes are targeted rather than the latest and greatest. If you’re using transformations, you’ll likely have to stage the changes and update your patches on major updates.

Avoid mutating resources with policies when possible

You can use an admission controller to mutate offending resources rather than failing them outright. In my opinion this is something you should not do if you can avoid it. It is generally better to correct the resource manifest at its source rather than mutating it. Mutations are excellent for objects like logging sidecars and other container adjacent services.

If you enjoyed this article, have any questions, noticed something inaccurate, or you just want to say hi feel free to drop a comment below or send an email to me@norling.io