AdinErmie.com

A site dedicated to Cloud and Datacenter Management

Book Review: Learn Kubernetes Security

Recently, I finished reading Learn Kubernetes Security by Kaizhe Huang and Pranjal Jumde.

I’ve recently been ramping up on Kubernetes, so, I have some previous knowledge and hands-on experience with Kubernetes when I read this book.

I found the whole book quite helpful and in particular Chapter 1 (“Kubernetes Architecture“) and Chapter 2 (“Kubernetes Networking“) provide a great overview of core Kubernetes concepts, in a clear and concise way. 

Chapter 6 (“Securing Cluster Components“) was an amazingly detailed chapter, where it provides a breakdown of each Kubernetes component, and the best practices around security. I highlighted practically the entire chapter! Similarly, Chapter 8 (“Securing Kubernetes Pods“), has a lot of details for security from the Pod perspective. 

I’ve decided to share my highlights from reading this specific publication, in case the points that I found of note/interest will be of some benefit to someone else. So, here are my highlights (by chapter). Note that not every chapter will have highlights (depending on the content and the main focus of my work).

If my highlights peak your interest, I strongly recommend that you pick up a copy for yourself.

Chapter 1: Kubernetes Architecture

  • With a dedicated network namespace, processes cannot communicate with other processes without a proper network configuration, even though they’re running on the same node.
  • There are two types of nodes: master nodes and worker nodes. The main control plane, such as kube-apiserver, runs on the master nodes. The agent running on each worker node is called kubelet, working as a minion on behalf of kube-apiserver, and runs on the worker nodes.
  • A master node generally has kube-apiserver, etcd storage, kube-controller-manager, cloud-controller-manager, and kube-scheduler.
  • The worker nodes have kubelet, kube-proxy, a Container Runtime Interface (CRI) component, a Container Storage Interface (CRI) component, and so on.
    • kube-apiserver: The Kubernetes API server (kube-apiserver) is a control-plane component that validates and configures data for objects such as pods, services, and controllers.
    • etcd: etcd is a high-availability key-value store used to store data such as configuration, state, and metadata.
    • kube-scheduler: kube-scheduler is a default scheduler for Kubernetes. It watches for newly created pods and assigns pods to the nodes.
    • kube-controller-manager: The Kubernetes controller manager is a combination of the core controllers that watch for state updates and make changes to the cluster accordingly.
    • cloud-controller-manager: The cloud container manager was introduced in v1.6; it runs controllers to interact with the underlying cloud providers. This is an attempt to decouple the cloud vendor code from the Kubernetes code.
    • kubelet: kubelet runs on every node. It registers the node with the API server. kubelet monitors pods created using Podspecs and ensures that the pods and containers are healthy.
    • kube-proxy: kube-proxy is a networking proxy that runs on each node. It manages the networking rules on each node and forwards or filters traffic based on these rules.
    • kube-dns: DNS is a built-in service launched at cluster startup. With v1.12, CoreDNS became the recommended DNS server, replacing kube-dns.
  • kubenet only supports 50 nodes per cluster, which obviously cannot meet any requirements of large-scale deployment.
  • Kubernetes leverages a Container Networking Interface (CNI) as a common interface between the network providers and Kubernetes’ networking components to support network communication in a cluster with a large scale.
  • The container storage interface provides an interface for exposing arbitrary blocks and file storage to Kubernetes.
  • At the lowest level of Kubernetes, container runtimes ensure containers start, work, and stop. The most popular container runtime is Docker.
  • A pod is a basic building block of a Kubernetes cluster. It’s a group of one or more containers that are expected to co-exist on a single host. Containers within a pod can reference each other using localhost or inter-process communications (IPCs).
  • Kubernetes deployments help scale pods up or down based on labels and selectors. The YAML spec for a deployment consists of replicas, which is the number of instances of pods that are required, and template, which is identical to a pod specification.
  • A Kubernetes service is an abstraction of an application. A service enables network access for pods. Services and deployments work in conjunction to ease the management and communication between different pods of an application.
  • It is better to use deployments over replica sets. Deployments encapsulate replica sets and pods. Additionally, deployments provide the ability to carry out rolling updates.
  • Namespaces help a physical cluster to be divided into multiple virtual clusters. Multiple objects can be isolated within different namespaces. Default Kubernetes ships with three namespaces: default, kube-system, and kube-public.
  • Pods that need to interact with kube-apiserver use service accounts to identify themselves. By default, Kubernetes is provisioned with a list of default service accounts: kube-proxy, kube-dns, node-controller, and so on.
  • The pod security policy is a cluster-level resource that defines a set of conditions that must be fulfilled for a pod to run on the system.
  • These policies must be accessible to the requesting user or the service account of the target pod to work.
  • There is another thing that OpenShift projects do better than kubernetes namespaces when creating a project in OpenShift, you can modify the project template and add extra objects, such as NetworkPolicy and default quotas, to the project that are compliant with your company’s policy.
  • Kubedex (https://kubedex.com/google-gke-vs-microsoft-aks-vs-amazon-eks/) have carried out a great comparison of the cloud Kubernetes services.

Chapter 2: Kubernetes Networking

  • The beauty of this design is that it offers a clean, backward-compatible model where pods act like Virtual Machines (VMs) or physical hosts from the perspective of port allocation, naming, service discovery, load balancing, application configuration, and migration.
  • The Kubernetes service is the one that surfaces the internal application to the public.
  • The IP address assigned to each pod is a private IP address or a cluster IP address that is not publicly accessible.
  • Containers inside the same pod share at least the same IPC namespace and network namespace; as a result, K8s needs to resolve potential conflicts in port usage.
  • Despite the lack of activity, the Pause container plays a critical role in the pod. It serves as a placeholder to hold the network namespace for all other containers in the same pod.
  • The Kubernetes service is an abstraction of a grouping of sets of pods with a definition of how to access the pods.
  • The reason to call it a virtual IP address is that, from a node’s perspective, there is neither a namespace nor a network interface bound to a service as there is with a pod.
  • So, what kube-proxy does to solve the two problems mentioned earlier is that it forwards all the traffic whose destination is the target service (the virtual IP) to the pods grouped by the service (the actual IP); meanwhile, kube-proxy watches the Kubernetes control plane for the addition or removal of the service and endpoint objects (pods).
  • By default, kube-proxy in user space mode uses a round-robin algorithm to choose which backend pod to forward the requests to.
  • kube-proxy in the iptables proxy mode is only responsible for maintaining and updating the iptables rules. Any traffic targeted to the service IP will be forwarded to the backend pods by netfilter, based on the iptables rules managed by kube-proxy.
  • The disadvantage of this mode is the error handling required. For a case where kube-proxy runs in the iptables proxy mode, if the first selected pod does not respond, the connection will fail. While in the user space mode, however, kube-proxy would detect that the connection to the first pod had failed and then automatically retry with a different backend pod.
  • Services are usually defined with a selector, which is a label attached to pods that need to be in the same service. A service can be defined without a selector.
  • A service can have four different types, as follows:
    • ClusterIP: This is the default value. This service is only accessible within the cluster.
    • NodePort: This service is accessible via a static port on every node. NodePorts expose one service per port and require manual management of IP address changes.
    • LoadBalancer: This service is accessible via a load balancer. A node balancer per service is usually an expensive option.
    • ExternalName: This service has an associated Canonical Name Record (CNAME) that is used to access the service.
  • Ingress is a smart router that provides external HTTP/HTTPS (short for HyperText Transfer Protocol Secure) access to a service in a cluster. Services other than HTTP/HTTPS can only be exposed for the NodePort or LoadBalancer service types.
  • Ingress objects have five different variations, listed as follows:
    • Single-service Ingress: This exposes a single service by specifying a default backend and no rules
    • Simple fanout: A fanout configuration routes traffic from a single IP to multiple services based on the Uniform Resource Locator (URL)
    • Name-based virtual hosting: This configuration uses multiple hostnames for a single IP to reach out to different services
    • Transport Layer Security (TLS): A secret can be added to the ingress spec to secure the endpoints
    • Load balancing: A load balancing ingress provides a load balancing policy, which includes the load balancing algorithm and weight scheme for all ingress objects.
  • The CNI specification is only concerned with the network connectivity of containers and removing allocated resources when the container is deleted.
    • First, from a container runtime’s perspective, the CNI spec defines an interface for the Container Runtime Interface (CRI) component (such as Docker) to interact with
    • Secondly, from a Kubernetes network model’s perspective, since CNI plugins are actually another flavor of Kubernetes network plugins, they have to comply with Kubernetes network model requirements
  • The network policy implementation is not required in the CNI specification, but when DevOps choose which CNI plugins to use, it is important to take security into consideration. Alexis Ducastel’s article (https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-10gbit-s-network-36475925a560) did a good comparison of the mainstream CNI plugins with the latest update in April 2019.
  • Here are a few things about Calico worth highlighting:
    • Calico provides a flat IP network, which means there will be no IP encapsulation appended to the IP message (no overlays). Also, this means that each IP address assigned to the pod is fully routable. The ability to run without an overlay provides exceptional throughput characteristics.
    • Calico has better performance and less resource consumption, according to Alexis Ducastel’s experiments. Calico offers a more comprehensive network policy compared to Kubernetes’ built-in network policy.
    • Kubernetes’ network policy can only define whitelist rules, while Calico network policies can define blacklist rules (deny).

Chapter 3: Threat Modeling

  • Threat modeling involves identifying threats, understanding the effects of each threat, and finally developing a mitigation strategy for every threat.
  • After a successful threat modeling session, you’re able to define the following:
    • Asset: A property of an ecosystem that you need to protect.
    • Security control: A property of a system that protects the asset against identified risks. These are either safeguards or countermeasures against the risk to the asset.
    • Threat actor: A threat actor is an entity or organization including script kiddies, nation-state attackers, and hacktivists who exploit risks.
    • Attack surface: The part of the system that the threat actor is interacting with. It includes the entry point of the threat actor into the system.
    • Threat: The risk to the asset.
    • Mitigation: Mitigation defines how to reduce the likelihood and impact of a threat to an asset.
  • The industry usually follows one of the following approaches to threat modeling:
    • STRIDE: The STRIDE model was published by Microsoft in 1999. It is an acronym for Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Escalation of Privilege. STRIDE models threats to a system to answer the question, ‘What can go wrong with the system?’
    • PASTA: Process for Attack Simulation and Threat Analysis is a risk-centric approach to threat modeling. PASTA follows an attacker-centric approach, which is used by the business and technical teams to develop asset-centric mitigation strategies.
    • VAST: Visual, Agile, and Simple Threat modeling aims to integrate threat modeling across application and infrastructure development with SDLC and agile software development. It provides a visualization scheme that provides actionable outputs to all stakeholders such as developers, architects, security researchers, and business executives.
  • It is worth noting that only kube-apiserver communicates with etcd. Other Kubernetes components such as kube-scheduler, kube-controller-manager, and cloud-controller manager interact with kube-apiserver running in the master nodes in order to fulfill their responsibilities. On the worker nodes, both kubelet and kube-proxy communicate with kube-apiserver.
  • Neither data in transit nor at rest is encrypted by default in etcd.
  • DaemonSet basically means the microservice will run inside a pod in every node.
  • Note that not all communication between components is secure by default. It depends on the configuration of those components.
  • Also, note that kube-apiserver and etcd are the brain and heart of a Kubernetes cluster. If either of them were to get compromised, that would be game over.
  • Trail of Bits and Atredis Partners have done a good job on Kubernetes components’ threat modeling. Their whitepaper highlights in detail the threats in each Kubernetes component. You can find the whitepaper at https://github.com/kubernetes/community/blob/master/wg-security-audit/findings/Kubernetes%20Threat%20Model.pdf.

Chapter 4: Applying the Principle of Least Privilege in Kubernetes

  • Kubernetes supports both ABAC and RBAC. Though ABAC is powerful and flexible, the implementation in Kubernetes makes it difficult to manage and understand. Thus, it is recommended to enable RBAC instead of ABAC in Kubernetes.
  • Users in the system:master group have the cluster-admin role granted, meaning they can manage the entire Kubernetes cluster, while users in the system:kube-proxy group can only access the resources required by the kube-proxy component.
  • From version 1.6 onward, RBAC is enabled by default in Kubernetes. Before version 1.6, RBAC could be enabled by running the Application Programming Interface (API) server with the –authorization-mode=RBAC flag.
  • Pods authenticate to the kube-apiserver object using a service account. Service accounts are created using API calls. They are restricted to namespaces and have associated credentials stored as secrets. By default, pods authenticate as a default service account.
  • To ensure least privilege, cluster administrators should associate every Kubernetes resource with a service account with least privilege to operate.
  • In Kubernetes, there are no deny permissions. Thus, a role is an addition of a set of permissions.
  • By default, Kubernetes has three different namespaces. The three namespaces are described as follows:
    • default: A namespace for resources that are not part of any other namespace.
    • kube-system: A namespace for objects created by Kubernetes such as kube-apiserver, kube-scheduler, controller-manager, and coredns.
    • kube-public: Resources within this namespace are accessible to all. By default, nothing will be created in this namespace.
  • In Kubernetes, not all objects are namespaced. Lower-level objects such as Nodes and persistentVolumes span across namespaces.
  • A user in Kubernetes is for humans, while a service account is for microservices in pods.
  • In order to implement least privilege for Kubernetes subjects, you may ask yourself the following questions before you create a Role or RoleBinding object in Kubernetes:
    • Does the subject need privileges for a namespace or across namespaces? This is important because once the subject has cluster-level privileges it may be able to exercise the privileges across all namespaces.
    • Should the privileges be granted to a user, group, or service account? When you grant a role to a group, it means all the users in the group will automatically get the privileges from the newly granted role. Be sure you understand the impact before you grant a role to a group. Also, note that some microservices do not need any privilege at all as they don’t interact with kube-apiserver or any Kubernetes objects directly.
    • What are the resources that the subjects need to access? When creating a role, if you don’t specify the resource name or do set * in the resourceNames field, it means access is granted to all the resources of the resource type.
  • Besides accessing kube-apiserver to operate Kubernetes objects, processes in a pod can also access resources on the worker nodes and other pods/microservices in the clusters.
  • Configuring the pod/container security context should be on the developers’ task list (with the help of security design and review), while pod security policies the other way to limit pod/container access to system resources at the cluster level should be on DevOps’s to-do list.
  • It is highly recommended not to run your microservice as a root user (UID = 0) in containers. The security implication is that if there is an exploit and a container escapes to the host, the attacker gains the root user privileges on the host immediately.
  • Note that AllowPrivilegeEscalation is always true when the container is either running as privileged or has a CAP_SYS_ADMIN capability.
  • It’s always a good practice to set resource requests and limits for workload. The resource request impacts which node the pods will be assigned to by the scheduler, while the resource limit sets the condition under which the container will be terminated.
  • In general, a Kubernetes network policy defines rules of how a group of pods are allowed to communicate with each other and other network endpoints. You can define both ingress rules and egress rules for your workload.
  • During deployment, DevOps should consider using a PodSecurityPolicy and a network policy to enforce least privileges across the entire cluster.
  • Help defining RBAC privilege grants: https://github.com/liggitt/audit2rbac

Chapter 5: Configuring Kubernetes Security Boundaries

  • In containerized environments, chroot prevents containers from tampering with the filesystems of other containers.
  • Think of trust boundary as a wall and security boundary as a fence around the wall.
  • A Kubernetes cluster can be broadly split into three security domains:
    • Kubernetes master components: Kubernetes master components define the control plane for the Kubernetes ecosystem. The master components are responsible for decisions required for the smooth operation of the cluster, such as scheduling. Master components include kube-apiserver, etcd, the kube-controller manager, DNS server, and kube-scheduler. A breach in the Kubernetes master components can compromise the entire Kubernetes cluster.
    • Kubernetes worker components: Kubernetes worker components are deployed on every worker node and ensure that Pods and containers are running nicely. Kubernetes worker components use authorization and TLS tunneling for communicating with the master components.
    • Kubernetes objects: Kubernetes objects are persistent entities that represent the state of the cluster: deployed applications, volumes, and namespaces. Kubernetes objects include Pods, Services, volumes, and namespaces.
  • kube-apiserver is the only security boundary that protects the master components from compromise by privileged attackers. If a privileged attacker compromises kube-apiserver, it’s game over.
  • By default, each Pod has its own network namespace and IPC namespace. Each container inside the same pod has its own PID namespace so that one container has no knowledge about other containers running inside the same Pod. Similarly, a Pod does not know other Pods exist in the same worker node.
  • By default, here is a list of capabilities that are assigned to containers in Kubernetes clusters:
    • CAP_SETPCAP
    • CAP_MKNOD
    • CAP_AUDIT_WRITE
    • CAP_CHOWN
    • CAP_NET_RAW
    • CAP_DAC_OVERRIDE
    • CAP_FOWNER
    • CAP_FSETID
    • CAP_KILL
    • CAP_SETGID
    • CAP_SETUID
    • CAP_NET_BIND_SERVICE
    • CAP_SYS_CHROOT
    • CAP_SETFCAP
  • You should drop all the capabilities and only add the required ones.
  • In general, the fewer capabilities granted to containers, the more secure the boundaries are for other microservices.
  • And it is highly recommended to use PodSecurityPolicy to restrict the usage of host namespaces as well as extra capabilities so that the security boundaries of microservices are fortified.
  • To strengthen the trust boundaries for microservices from a network aspect, you might want to either specify the allowed ipBlock from external or allowed microservices from a specific namespace.
  • Kubernetes network policies: https://kubernetes.io/docs/concepts/services-networking/network-policies/

Chapter 6: Securing Cluster Components

  • kube-apiserver, by default, supports Attribute-Based Access Control (ABAC), Role-Based Access Control (RBAC), node authorization, and Webhooks for authorization. RBAC is the recommended mode of authorization.
  • To secure the API server, you should do the following:
    • Disable anonymous authentication:
      • Use the anonymous-auth=false flag to set anonymous authentication to false. This ensures that requests rejected by all authentication modules are not treated as anonymous and are discarded.
    • Disable basic authentication:
      • Basic authentication is supported for convenience in kube-apiserver and should not be used. Basic authentication passwords persist indefinitely. kube-apiserver uses the –basic-auth-file argument to enable basic authentication. Ensure that this argument is not used.
    • Disable token authentication:
      • –token-auth-file enables token-based authentication for your cluster. Token-based authentication is not recommended. Static tokens persist forever and need a restart of the API server to update. Client certificates should be used for authentication.
    • Ensure connections with kubelet use HTTPS:
      • By default, –kubelet-https is set to true. Ensure that this argument is not set to false for kube-apiserver.
    • Disable profiling:
      • Enabling profiling using –profiling exposes unnecessary system and program details. Unless you are experiencing performance issues, disable profiling by setting –profiling=false.
    • Disable AlwaysAdmit:
      • –enable-admission-plugins can be used to enable admission control plugins that are not enabled by default. AlwaysAdmit accepts the request. Ensure that the plugin is not in the –enabled-admission-plugins list.
    • Use AlwaysPullImages:
      • The AlwaysPullImages admission control ensures that images on the nodes cannot be used without correct credentials. This prevents malicious pods from spinning up containers for images that already exist on the node.
    • Use SecurityContextDeny:
      • This admission controller should be used if PodSecurityPolicy is not enabled. SecurityContextDeny ensures that pods cannot modify SecurityContext to escalate privileges.
    • Enable auditing:
      • Auditing is enabled by default in kube-apiserver. Ensure that –audit-log-path is set to a file in a secure location. Additionally, ensure that the maxage, maxsize, and maxbackup parameters for auditing are set to meet compliance expectations.
    • Disable AlwaysAllow authorization:
      • Authorization mode ensures that requests from users with correct privileges are parsed by the API server. Do not use AlwaysAllow with –authorization-mode.
    • Enable RBAC authorization:
      • RBAC is the recommended authorization mode for the API server. ABAC is difficult to use and manage. The ease of use, and easy updates to, RBAC roles and role bindings makes RBAC suitable for environments that scale often.
    • Ensure requests to kubelet use valid certificates:
      • By default, kube-apiserver uses HTTPS for requests to kubelet. Enabling –kubelet-certificate-authority, –kubelet-client-key, and –kubelet-client-key ensures that the communication uses valid HTTPS certificates.
    • Enable service-account-lookup:
      • In addition to ensuring that the service account token is valid, kube-apiserver should also verify that the token Enable PodSecurityPolicy: –enable-admission-plugins can be used to enable PodSecurityPolicy. PodSecurityPolicy is used to define the security-sensitive criteria for a pod.
    • Use a service account key file:
      • Use of –service-account-key-file enables rotation of keys for service accounts. If this is not specified, kube-apiserver uses the private key from the Transport Layer Security (TLS) certificates to sign the service account tokens.
    • Enable authorized requests to etcd:
      • –etcd-certfile and –etcd-keyfile can be used to identify requests to etcd. This ensures that any unidentified requests can be rejected by etcd.
    • Do not disable the ServiceAccount admission controller:
      • This admission control automates service accounts. Enabling ServiceAccount ensures that custom ServiceAccount with restricted permissions can be used with different Kubernetes objects.
    • Do not use self-signed certificates for requests:
      • If HTTPS is enabled for kube-apiserver, a –tls-cert-file and a –tls-private-key-file should be provided to ensure that self-signed certificates are not used.
    • Secure connections to etcd:
      • Setting –etcd-cafile allows kube-apiserver to verify itself to etcd over Secure Sockets Layer (SSL) using a certificate file.
    • Use secure TLS connections:
      • Set –tls-cipher-suites to strong ciphers only. –tls-min-version is used to set the minimum-supported TLS version. TLS 1.2 is the recommended minimum version.
    • Enable advanced auditing:
      • Advanced auditing can be disabled by setting the –feature-gates to AdvancedAuditing=false. Ensure that this field is present and is set to true. Advanced auditing helps in an investigation if a breach happens.
  • To secure kubelet, you should do the following:
    • Disable anonymous authentication:
      • If anonymous authentication is enabled, requests that are rejected by other authentication methods are treated as anonymous. Ensure that –anonymous-auth=false is set for each instance of kubelet.
    • Set the authorization mode:
      • The authorization mode for kubelet is set using config files. A config file is specified using the –config parameter. Ensure that the authorization mode does not have AlwaysAllow in the list.
    • Rotate kubelet certificates:
      • kubelet certificates can be rotated using a RotateCertificates configuration in the kubelet configuration file. This should be used in conjunction with RotateKubeletServerCertificate to auto-request rotation of server certificates.
    • Provide a Certificate Authority (CA) bundle:
      • A CA bundle is used by kubelet to verify client certificates. This can be set using the ClientCAFile parameter in the config file.
    • Restrict access to the Kubelet API:
      • Only the kube-apiserver component interacts with the kubelet API. If you try to communicate with the kubelet API on the node, it is forbidden. This is ensured by using RBAC for kubelet.
    • Disable the read-only port:
      • The read-only port is enabled for kubelet by default, and should be disabled. The read-only port is served with no authentication or authorization.
    • Enable the NodeRestriction admission controller:
      • The NodeRestriction admission controller only allows kubelet to modify the node and pod objects on the node it is bound to.
  • To secure etcd, you should do the following:
    • Restrict node access:
      • Use Linux firewalls to ensure that only nodes that need access to etcd are allowed access.
    • Ensure the API server uses TLS:
      • –cert-file and –key-file ensure that requests to etcd are secure.
    • Use valid certificates:
      • –client-cert-auth ensures that communication from clients is made using valid certificates, and setting –auto-tls to false ensures that self-signed certificates are not used.
    • Encrypt data at rest:
      • –encryption-provider-config is passed to the API server to ensure that data is encrypted at rest in etcd.
  • To secure kube-scheduler, you should do the following:
    • Disable profiling:
      • Profiling of kube-scheduler exposes system details. Setting –profiling to false reduces the attack surface.
    • Disable external connections to kube-scheduler:
      • External connections should be disabled for kube-scheduler. AllowExtTrafficLocalEndpoints is set to true, enabling external connections to kube-scheduler. Ensure that this feature is disabled using –feature-gates.
    • Enable AppArmor:
      • By default, AppArmor is enabled for kube-scheduler. Ensure that AppArmor is not disabled for kube-scheduler.
  • To secure kube-controller-manager, you should use –use-service-account-credentials which, when used with RBAC ensures that control loops run with minimum privileges.
  • kube-dns has been superseded by CoreDNS since version 1.11 because of security vulnerabilities in dnsmasq and performance issues in SkyDNS. CoreDNS is a single container that provides all the functions of kube-dns.
  • To secure CoreDNS, do the following:
    • Ensure that the health plugin is not disabled:
      • The health plugin monitors the status of CoreDNS. It is used to confirm if CoreDNS is up and running. It is enabled by adding health to the list of plugins to be enabled in Corefile.
    • Enable istio for CoreDNS:
      • istio is a service mesh that is used by Kubernetes to provide service discovery, load balancing, and authentication. It is not available by default in Kubernetes and needs to be added as an external dependency.
  • kube-bench is an automated tool written in Go and published by Aqua Security that runs tests documented in the CIS benchmark.
  • GitHub (kube-bench): https://github.com/aquasecurity/kube-bench

Chapter 7: Authentication, Authorization, and Admission Control

  • Authentication validates the identity of a user. Once the identity is validated, authorization is used to check whether the user has the privileges to perform the desired action.
  • Admission controllers intercept requests that create, update, or delete an object in the admission controller.
  • Admission controllers fall into two categories: mutating or validating.
  • Mutating admission controllers run first; they modify the requests they admit.
  • Validating admission controllers run next. These controllers cannot modify objects.
  • In v1.6+, anonymous access is allowed to support anonymous and unauthenticated users for the RBAC and ABAC authorization modes. It can be explicitly disabled by passing the –anonymous-auth=false flag to the API server configuration.
  • Using X509 Certificate Authority (CA) certificates is the most common authentication strategy in Kubernetes. It can be enabled by passing –client-ca-file=file_path to the
  • server static tokens, which are a popular mode of authentication in development and debugging environments but should not be used in production clusters.
  • Once compromised, the only way to generate a new token is to restart the API server.
  • Similar to static tokens, basic authentication passwords cannot be changed without restarting the API server. Basic authentication should not be used in production clusters.
  • Bootstrap tokens are the default authentication method used in Kubernetes. They are dynamically managed and stored as secrets in kube-system.
  • The default service account is associated with a pod if no service account is specified.
  • The service account authenticator is automatically enabled. It verifies signed bearer tokens. The signing key is specified using –service-account-key-file. If this value is unspecified, the Kube API server’s private key is used.
  • Node authorization mode grants permissions to kubelets to access services, endpoints, nodes, pods, secrets, and persistent volumes for a node.
  • ABAC is difficult to configure and maintain. It is not recommended that you use ABAC in production environments.
  • Role and RoleBinding are restricted to namespaces. If a role needs to span across namespaces, ClusterRole and ClusterRoleBinding can be used to grant permissions to users across namespace boundaries.
  • A cluster can have four types of limits: Namespace, Server, User and SourceAndObject. With each limit, the user can have a maximum limit for the Queries Per Second (QPS), the burst and cache size.
  • It is recommended that PodSecurityPolicy is enabled by default in a cluster. However, due to the administrative overhead, SecurityContextDeny can be used until PodSecurityPolicy is configured for the cluster.
  • Policies for OPA are defined in a custom language called Rego.
  • You can use the official OPA documentation (https://www.openpolicyagent.org/docs/latest/kubernetes-tutorial/) to install OPA on your cluster.

Chapter 8: Securing Kubernetes Pods

  • Image scanning tools only focus on finding publicly disclosed issues in applications bundled inside the image. But, following the best practices along with secure configuration while building the image ensures that the application has a minimal attack surface.
  • A Dockerfile contains a series of instructions, such as copy files, configure environment variables, configure open ports, and container entry points, which can be understood by the Docker daemon to construct the image file.
  • Each Dockerfile instruction will create a file layer in the image.
  • Let’s take a look at the security recommendations from CIS Docker benchmarks regarding container images:
    • Create a user for a container image to run a microservice:
      • It is good practice to run a container as non-root. Although user namespace mapping is available, it is not enabled by default.
    • Use trusted base images to build your own image:
      • Images downloaded from public repositories cannot be fully trusted. It is well known that images from public repositories may contain malware or crypto miners. Hence, it is recommended that you build your image from scratch or use minimal trusted images, such as Alpine. Also, perform the image scan after your image has been built.
    • Do not install unnecessary packages in your image:
      • Installing unnecessary packages will increase the attack surface. It is recommended that you keep your image slim. Occasionally, you will probably need to install some tools during the process of building an image. Do remember to remove them at the end of the Dockerfile.
    • Scan and rebuild an image in order to apply security patches:
      • It is highly likely that new vulnerabilities will be discovered in your base image or in the packages you install in your image. It is good practice to scan your image frequently. Once you identify any vulnerabilities, try to patch the security fixes by rebuilding the image. Image scanning is a critical mechanism for identifying vulnerabilities at the build stage.
    • Enable content trust for Docker:
      • Content trust uses digital signatures to ensure data integrity between the client and the Docker registry. It ensures the provenance of the container image. However, it is not enabled by default. You can turn it on by setting the environment variable, DOCKER_CONTENT_TRUST, to 1.
    • Add a HEALTHCHECK instruction to the container image:
      • A HEALTHCHECK instruction defines a command to ask Docker Engine to check the health status of the container periodically. Based on the health status check result, Docker Engine then exits the non-healthy container and initiates a new one.
    • Ensure that updates are not cached in Dockerfile:
      • Depending on the base image you choose, you may need to update the package repository before installing new packages. However, if you specify RUN apt-get update (Debian) in a single line in the Dockerfile, Docker Engine will cache this file layer, so, when you build your image again, it will still use the old package repository information that is cached. This will prevent you from using the latest packages in your image. Therefore, either use update along with install in a single Dockerfile instruction or use the –no-cache flag in the Docker build command.
    • Remove setuid and setgid permission from files in the image:
      • setuid and setgid permissions can be used for privilege escalation as files with such permissions are allowed to be executed with owners’ privileges instead of launchers’ privileges. You should carefully review the files with setuid and setgid permissions and remove those files that don’t require such permissions.
    • Use COPY instead of ADD in the Dockerfile:
      • The COPY instruction can only copy files from the local machine to the filesystem of the image, while the ADD instruction can not only copy files from the local machine but also retrieve files from the remote URL to the filesystem of the image. Using ADD may introduce the risk of adding malicious files from the internet to the image.
    • Do not store secrets in the Dockerfile:
      • There are many tools that are able to extract image file layers. If there are any secrets stored in the image, secrets are no longer secrets. Storing secrets in the Dockerfile renders containers potentially exploitable. A common mistake is to use the ENV instruction to store secrets in environment variables.
    • Install verified packages only:
      • This is similar to using the trusted base image only. Observe caution as regards the packages you are going to install within your image. Make sure they are from trusted package repositories.
  • Ideally, application developers and security engineers work together to harden the microservice at the pod and container level by configuring the security context provided by Kubernetes.
  • We classify the major security attributes into four categories:
    • Setting host namespaces for pods
    • Security context at the container level
    • Security context at the pod level
    • AppArmor profile
  • The following attributes in the pod specification are used to configure the use of host namespaces:
    • hostPID: By default, this is false. Setting it to true allows the pod to have visibility on all the processes in the worker node.
    • hostNetwork: By default, this is false. Setting it to true allows the pod to have visibility on all the network stacks in the worker node.
    • hostIPC: By default, this is false. Setting it to true allows the pod to have visibility on all the IPC resources in the worker node.
  • Each container can have its own security context, which defines privileges and access controls. The design of a security context at a container level provides a more fine-grained security control for Kubernetes workloads.
  • The following are the principal attributes of a security context for containers:
    • privileged: By default, this is false. Setting it to true essentially makes the processes inside the container equivalent to the root user on the worker node.
    • capabilities: There is a default set of capabilities granted to the container by the container runtime. The default capabilities granted are as follows: CAP_SETPCAP, CAP_MKNOD, CAP_AUDIT_WRITE, CAP_CHOWN, CAP_NET_RAW, CAP_DAC_OVERRIDE, CAP_FOWNER, CAP_FSETID, CAP_KILL, CAP_SETGID, CAP_SETUID, CAP_NET_BIND_SERVICE, CAP_SYS_CHROOT, and CAP_SETFCAP
      • You may add extra capabilities or drop some of the defaults by configuring this attribute. Capabilities such as CAP_SYS_ADMIN and CAP_NETWORK_ADMIN should be added with caution. For the default capabilities, you should also drop those that are unnecessary.
    • allowPrivilegeEscalation: By default, this is true. Setting it directly controls the no_new_privs flag, which will be set to the processes in the container. Basically, this attribute controls whether the process can gain more privileges than its parent process. Note that if the container runs in privileged mode, or has the CAP_SYS_ADMN capability added, this attribute will be set to true automatically. It is good practice to set it to false.
    • readOnlyRootFilesystem: By default, this is false. Setting it to true makes the root filesystem of the container read-only, which means that the library files, configuration files, and so on are read-only and cannot be tampered with. It is a good security practice to set it to true.
    • runAsNonRoot: By default, this is false. Setting it to true enables validation that the processes in the container cannot run as a root user (UID=0). Validation is done by kubelet. With runAsNonRoot set to true, kubelet will prevent the container from starting if run as a root user. It is a good security practice to set it to true. This attribute is also available in PodSecurityContext, which takes effect at pod level. If this attribute is set in both SecurityContext and PodSecurityContext, the value specified at the container level takes precedence.
    • runAsUser: This is designed to specify to the UID to run the entrypoint process of the container image. The default setting is the user specified in the image’s metadata (for example, the USER instruction in the Dockerfile). This attribute is also available in PodSecurityContext, which takes effect at the pod level. If this attribute is set in both SecurityContext and PodSecurityContext, the value specified at the container level takes precedence.
    • runAsGroup: Similar to runAsUser, this is designed to specify the Group ID or GID to run the entrypoint process of the container. This attribute is also available in PodSecurityContext, which takes effect at the pod level. If this attribute is set in both SecurityContext and PodSecurityContext, the value specified at the container level takes precedence.
    • seLinuxOptions: This is designed to specify the SELinux context to the container. By default, the container runtime will assign a random SELinux context to the container if not specified. This attribute is also available in PodSecurityContex, which takes effect at the pod level. If this attribute is set in both SecurityContext and PodSecurityContext, the value specified at the container level takes precedence.
  • In general, the security best practices are as follows:
    • Do not run in privileged mode unless necessary.
    • Do not add extra capabilities unless necessary.
    • Drop unused default capabilities.
    • Run containers as a non-root user.
    • Enable a runAsNonRoot check.
    • Set the container root filesystem as read-only.
  • Note that adding NETWORK_ADMIN is not recommended for containers running in production environments.
  • The following is a list of the principal security attributes at the pod level:
    • fsGroup: This is a special supplemental group applied to all containers. The effectiveness of this attribute depends on the volume type. Essentially, it allows kubelet to set the ownership of the mounted volume to the pod with the supplemental GID.
    • sysctls: sysctls is used to configure kernel parameters at runtime. In such a context, sysctls and kernel parameters are used interchangeably. These sysctls commands are namespaced kernel parameters that apply to the pod. The following sysctls commands are known to be namespaced: kernel.shm*, kernel.msg*, kernel.sem, and kernel.mqueue.*. Unsafe sysctls are disabled by default and should not be enabled in production environments.
    • runAsUser: This is designed to specify the UID to run the entrypoint process of the container image. The default setting is the user specified in the image’s metadata (for example, the USER instruction in the Dockerfile). This attribute is also available in SecurityContext, which takes effect at the container level. If this attribute is set in both SecurityContext and PodSecurityContext, the value specified at the container level takes precedence.
    • runAsGroup: Similar to runAsUser, this is designed to specify the GID to run the entrypoint process of the container. This attribute is also available in SecurityContext, which takes effect at the container level.
    • runAsNonRoot: Set to false by default, setting it to true enables validation that the processes in the container cannot run as a root user (UID=0). Validation is done by kubelet. By setting it to true, kubelet will prevent the container from starting if run as a root user. It is a good security practice to set it to true.
    • seLinuxOptions: This is designed to specify the SELinux context to the container. By default, the container runtime will assign a random SELinux context to the container if not specified.
  • Note that AppArmor is not a Kubernetes object, like a pod, deployment, and so on. It can’t be operated through kubectl. You will have to SSH to each node and load the AppArmor profile into the kernel so that the pod may be able to use it.
  • Open source tools such as bane can help create AppArmor profiles for containers.
  • A Kubernetes PodSecurityPolicy is a cluster-level resource that controls security-sensitive aspects of the pod specification through which the access privileges of a Kubernetes pod are limited.
  • You can think of a PodSecurityPolicy as a policy to evaluate the security attributes defined in the pod’s specification. Only those pods whose security attributes meet the requirements of PodSecurityPolicy will be admitted to the cluster.
  • After you have created the Pod Security Policy, there is one more step required in order to enforce it. You will have to grant the privilege of using the PodSecurityPolicy object to the users, groups, or service accounts. By doing so, the pod security policies are entitled to evaluate the workloads based on the associated service account.
  • Kubernetes PodSecurityPolicy Advisor (also known as kube-psp-advisor) is an open source tool from Sysdig. It scans the security attributes of running workloads in the cluster and then, on this basis, recommends pod security policies for your cluster or workloads.

Chapter 9: Image Scanning in DevOps Pipelines

  • The image scanning tool extracts the image file, then looks for all the available packages and libraries in the image and looks up their version within the vulnerability database. If there is any package whose version matches with any of the CVE’s descriptions in the vulnerability database, the image scanning tool will report that there is a vulnerability in the image.
  • The CVSS calculator is available at https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator
  • Anchore Engine policies allow you to define rules to handle vulnerabilities differently based on their severity.
  • A rough definition of the DevOps stages that are applicable for image scanning:
    • Build: When the image is built in the CI/CD pipeline
    • Deployment: When the image is about to be deployed in a Kubernetes cluster
    • Runtime: After the image is deployed to a Kubernetes cluster and the containers are up and running
  • Anchore also offers an image scan GitHub action called Anchore Container Scan. It launches the Anchore Engine scanner on the newly built image and returns the vulnerabilities, manifests, and a pass/fail policy evaluation that can be used to fail the build if desired.
  • Image scanning admission controller is an open source project from Sysdig. It scans images from the workload that is about to be deployed. If an image fails the image scanning policy, the workload will be rejected.
  • Scanning as validating admission is a good security practice for Kubernetes deployment.

Chapter 10: Real-Time Monitoring and Resource Management of a Kubernetes Cluster

  • If all the resources of the node are being utilized by the pods, kubelet on the node will clean up dead pods – unused images. If the cleanup does not reduce the stress, kubelet will start evicting those pods that consume more resources.
  • LimitRanger works when the request to create or update the object is received by the API Server but not at runtime. If a pod has a violating limit before the limit is applied, it will keep running.
  • It is recommended that service account tokens should be used to access Kubernetes Dashboard
  • Kubernetes Dashboard provides all the functionality a cluster administrator requires in order to manage resources and objects within the cluster. Given the functionality of the dashboard, access to the dashboard should be limited to cluster administrators.
  • To allow a service account to use the Kubernetes dashboard, you need to add the cluster-admin role to the service account.
  • Ensure that the dashboard container is running with the following arguments:
    • Disable insecure port: –insecure-port enables Kubernetes Dashboard to receive requests over HTTP. Ensure that it is disabled in production environments.
    • Disable insecure address: –insecure-bind-address should be disabled to avoid a situation where Kubernetes Dashboard is accessible via HTTP.
    • Bind address to localhost: –bind-address should be set to 127.0.0.1 to prevent hosts from being connected over the internet.
    • Enable TLS: Use tls-cert-file and tls-key-file to access the dashboard over secure channels. Ensure token authentication mode is enabled: Authentication mode can be specified using the –authentication-mode flag. By default, it is set to token. Ensure that basic authentication is not used with the dashboard.
    • Disable insecure login: Insecure login is used when the dashboard is available via HTTP. This should be disabled by default.
    • Disable skip login: Skip login allows unauthenticated users to access the Kubernetes dashboard. –enable-skip-login enables skip login; this should not be present in production environments.
    • Disable settings authorizer: –disable-settings-authorizer allows unauthenticated users to access the settings page. This should be disabled in production environments.
  • Metrics Server aggregates cluster usage data using the Summary API exposed by each kubelet on each node.
  • Metrics Server exposes the collected metrics through the Metrics API, which are used by the horizontal pod autoscalar and the vertical pod autoscalar.
  • In production clusters, make sure that Metrics Server does not use the –kubelet-insecure-tls flag, which allows Metrics Server to skip verification of certificates by the CA.
  • Prometheus uses a pull system. It sends an HTTP request called a scrape, which fetches data from the system components, including API Server, node-exporter, and kubelet. The response to the scrape and the metrics are stored in a custom database on the Prometheus server.
  • Let’s look at some examples of Prometheus queries that will be helpful for cluster administrators:
    • Kubernetes CPU usage:
      • sum(rate(container_cpu_usage_seconds_total{container_name!=””POD””,pod_name!=””””}[5m]))
    • Kubernetes CPU usage by namespace:
      • sum(rate(container_cpu_usage_seconds_total{container_name!=””POD””,namespace!=””””}[5m])) by (namespace)
    • CPU requests by pod:
      • sum(kube_pod_container_resource_requests_cpu_cores) by (pod)
  • Using Alertmanager with Prometheus helps deduplicate, group, and route alerts from applications such as Prometheus and route it to integrated clients, including email, OpsGenie, and PagerDuty.

Chapter 11: Defense in Depth

  • With auditing, a Kubernetes cluster administrator is able to answer questions such as the following:
    • What happened? (A pod is created and what kind of pod it is)
    • Who did it? (From user/admin)
    • When did it happen? (The timestamp of the event)
    • Where did it happen? (In which namespace is the pod created?)
  • An audit policy allows users to define rules about what kind of event should be recorded and how much detail of the event should be recorded.
  • When an event is processed by kube-apiserver, it compares the list of rules in the audit policy in order. The first matching rules also dictate the audit level of the event.
  • There are four audit levels, detailed as follows:
    • None: Do not log events that match the audit rule.
    • Metadata: When an event matches the audit rule, log the metadata (such as user, timestamp, resource, verb, and more) of the request to kube-apiserver.
    • Request: When an event matches the audit rule, log the metadata as well as the request body. This does not apply for the non-resource URL.
    • RequestResponse: When an event matches the audit rule, log the metadata, request-and-request-and-response body. This does not apply for the non-resource request.
  • The request-level event is more verbose than the metadata level events, while the RequestResponse level event is more verbose than the request-level event.
  • It is quite necessary to understand the differences between the audit levels so that you can define audit rules properly, both for resource consumption and security.
  • Please do choose the audit level properly. More verbose logs provide deeper insight into the activities being carried out. However, it does cost more in storage and time to process the audit events.
  • One thing worth mentioning is that if you set a request or a RequestResponse audit level on Kubernetes secret objects, the secret content will be recorded in the audit events. If you set the audit level to be more verbose than metadata for Kubernetes objects containing sensitive data, you should use a sensitive data redaction mechanism to avoid secrets being logged in the audit events.
  • There are two types of audit backends that can be configured to use process audit events: a log backend and a webhook backend.
  • Specifying more than one replica in the deployment or the StatefulSet, or using a DaemonSet, will ensure the high availability of your workload.
  • kube-dns are spun up with more than one pod by default, so their high availability is ensured.
  • By default, the secret data is stored in plaintext (encoded format) in etcd. etcd can be configured to encrypt secrets at rest.
  • Similarly, if etcd is not configured to encrypt communication using Transport Layer Security (TLS), secret data is transferred in plaintext too.
  • The init container is to prepopulate our secret, and the sidecar container is to keep that secret data in sync throughout our application’s life cycle.
  • Under the hood, kubectl-capture starts a new pod to do the capture on the host where the suspected victim pod is running, with a 120-second capture duration, so that we can see everything that is happening right now and in the next 120 seconds in that host.
  • Kubernetes auditing: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/
  • High availability with kubeadm: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

Chapter 12: Analyzing and Detecting Crypto-Mining Attacks

  • None

Chapter 13: Learning from Kubernetes CVEs

  • Security advisories and announcements (https://kubernetes.io/docs/reference/issues-security/security/) published by Kubernetes are the best way to keep track of new security vulnerabilities found in Kubernetes.
  • kube-hunter is an open source tool that is developed and maintained by Aqua that helps identify known security issues in your Kubernetes cluster.
%d bloggers like this: