Skip to main content

Kubernetes Interview Cheat Sheet

Top 50 interview questions with concise answers. Print this page or save as PDF for offline study.

View All Kubernetes Questions

1. What problem does Kubernetes fundamentally solve compared to running containers manually?

Running containers manually means you restart them manually when they crash, scale them manually when load increases, route traffic manually, and update them manually with downtime. Kubernetes automates all of this — it continuously maintains your desired state, reschedules failed containers, scales based on load, and rolls out updates with zero downtime.

2. What is a Kubernetes Cluster made of?

A cluster has control plane nodes (the brain — API server, scheduler, etcd, controller manager) and worker nodes (the muscle — where your apps run). The control plane makes decisions; worker nodes execute them. In production, you typically have 3 control plane nodes for high availability and as many workers as your workload needs.

3. What is a Pod in Kubernetes and why isn’t a container scheduled directly?

A Pod is the smallest deployable unit — a wrapper around one or more tightly-coupled containers that share the same network namespace, IP, and storage volumes. Containers in a pod communicate via localhost. Kubernetes schedules and manages Pods, not individual containers, making co-located processes easier to manage as one unit.

4. What is the role of the Kubelet on every worker node?

Kubelet is the node agent that talks to the API server and ensures the containers described in assigned Pods are actually running. It watches for Pod assignments, starts containers via the container runtime (containerd), runs health checks (liveness/readiness probes), and reports node and Pod status back to the control plane.

5. Why does every Pod get its own IP address?

Each Pod getting its own IP makes networking simple and flat — Pods talk to each other directly using their IPs without NAT. There's no port mapping needed between Pods. This flat network model makes service discovery straightforward and mimics how VMs communicate, making it easy to migrate apps into Kubernetes.

6. What is a Deployment used for?

A Deployment manages a ReplicaSet to keep a specified number of Pod replicas running. It handles rolling updates (new version deployed gradually), rollbacks (revert to a previous version), and self-healing (replace crashed Pods). It's the standard way to run stateless applications — web servers, APIs, background workers.

7. What is the difference between a Deployment and a StatefulSet?

Deployments are for stateless apps — Pods are interchangeable, created in any order, and can be killed and replaced without consequences. StatefulSets are for stateful apps (databases, message queues) — each Pod gets a stable hostname, stable storage, and is created/deleted in strict order. Identity matters for StatefulSets; it doesn't for Deployments.

8. What is a Service in Kubernetes?

A Service is a stable network endpoint that sits in front of a set of Pods. Since Pod IPs change as Pods are created and destroyed, Services provide a fixed IP and DNS name that clients connect to. Kubernetes routes traffic from the Service to healthy Pods behind it, handling load balancing automatically.

9. What does a ClusterIP Service do?

ClusterIP creates a virtual IP accessible only inside the cluster. Other Pods reach the Service by its DNS name or ClusterIP — Kubernetes routes the traffic to a healthy Pod. It's the default and most common Service type, used for internal service-to-service communication within the cluster.

10. Why is a NodePort Service rarely used in production?

NodePort opens a port (30000–32767) on every node in the cluster and routes traffic to the Service. It's hard to manage at scale (all nodes expose the port, even if no relevant Pod runs there), not load-balanced at the TCP layer, and exposes internal ports publicly. In production, an Ingress Controller or LoadBalancer Service is preferred.

11. What is the purpose of an Ingress Controller?

An Ingress Controller (like nginx or Traefik) handles external HTTP/HTTPS traffic and routes it to internal Services based on host name and URL path rules. Instead of one LoadBalancer per Service (expensive), one Ingress Controller handles all external traffic routing. It also handles TLS termination, redirects, and path rewrites.

12. What is etcd and why is it critical to Kubernetes?

etcd is the distributed key-value store that holds all cluster state — every object, every Pod spec, every config, every secret. It's the only stateful part of the control plane. If etcd goes down, the cluster can't accept new operations (though running workloads continue). Regular etcd backups are non-negotiable in production.

13. How does Kubernetes handle container restarts inside Pods?

The restartPolicy (Always, OnFailure, Never) on the Pod spec controls this. kubelet monitors containers and restarts them according to the policy. The restart count increments and kubelet applies exponential backoff (up to 5 minutes) before each retry — preventing a crashing container from hammering resources. This is what CrashLoopBackOff shows.

14. What does desired state mean in Kubernetes?

Desired state is what you declare (3 replicas, image v2, 2 CPU requests). Kubernetes controllers continuously compare this desired state to the actual state and take actions to close the gap. If a Pod crashes, the controller creates a new one. If you scale down, it deletes extras. Kubernetes is always converging toward your desired state.

15. What is a ReplicaSet and how does it relate to Deployments?

A ReplicaSet ensures a specified number of identical Pods are running at all times. You rarely create ReplicaSets directly — Deployments create and manage them for you. When you update a Deployment, it creates a new ReplicaSet for the new version and scales it up while scaling down the old one — enabling rolling updates.

16. Why should ConfigMaps not be used for sensitive data?

ConfigMaps store configuration as plain text — unencrypted, visible to anyone with read access to the namespace. They're designed for non-sensitive config (URLs, feature flags, env settings). Secrets are base64-encoded (not encrypted by default) but have tighter access controls and can be encrypted at rest — appropriate for passwords and tokens.

17. What is a Kubernetes Secret?

A Secret stores sensitive data (passwords, tokens, certificates) separately from the application image. They're base64-encoded in etcd (and can be encrypted at rest). Secrets can be mounted as files or injected as environment variables. Separating secrets from code means you don't expose them in Docker images or version control.

18. What is the purpose of a Namespace?

Namespaces divide a cluster into logical isolation units — different teams, environments (dev, staging), or projects. They provide scope for names (two Pods with the same name can exist in different namespaces), resource quotas (limit how much CPU/memory a team can use), and RBAC (grant access only to specific namespaces).

19. What is a Node in Kubernetes?

A Node is a physical or virtual machine in the cluster where Pods actually run. Each node runs kubelet (agent), kube-proxy (network), and a container runtime (containerd). Nodes provide the compute resources (CPU, memory, storage) that Pods consume. The control plane schedules Pods to nodes based on available capacity and constraints.

20. What happens when a Node becomes NotReady?

When a node goes NotReady, the node controller marks all its Pods as Unknown. After a timeout (default 5 minutes), Kubernetes evicts those Pods and reschedules them on healthy nodes — but only if they belong to a ReplicaSet, Deployment, or StatefulSet. Standalone Pods don't get rescheduled automatically.

21. How does Kubernetes ensure Pods are rescheduled automatically when a node fails?

When a node fails, its kubelet stops sending heartbeats. The node controller marks the node NotReady and eventually evicts its Pods. Workload controllers (Deployment, ReplicaSet) notice the Pod count dropped below desired and create new Pods on healthy nodes. This is automatic — no human intervention needed.

22. Why do Pods restart even when their containers exit with code 0?

The restartPolicy: Always setting restarts containers regardless of exit code — even 0 (clean exit). This is intentional for long-running services that should never stop. If your container exits cleanly with 0 but you don't want it restarted, use restartPolicy: OnFailure (restart only on non-zero exit) or restartPolicy: Never.

23. What is the role of kube-proxy in networking?

kube-proxy runs on every node and programs network rules (iptables or IPVS) that implement Service virtual IPs. When a Pod connects to a Service ClusterIP, kube-proxy's rules intercept the packet and DNAT it to a random healthy Pod backing that Service. It's how Kubernetes load-balances traffic to Services without a dedicated load balancer per Service.

24. How do readiness probes differ from liveness probes?

Readiness probe: is this container ready to receive traffic? If it fails, the Pod is removed from the Service endpoints — traffic stops going to it, but it keeps running. Liveness probe: is this container alive? If it fails, kubelet kills and restarts the container. Readiness gates traffic; liveness recovers from stuck processes.

25. What is the Kubernetes API Server responsible for?

The API Server is the central hub for all cluster communication — every component (kubelet, scheduler, controllers) reads and writes state through it. It validates and persists objects to etcd, enforces authentication and authorization, runs admission controllers, and serves the Kubernetes REST API. Nothing happens in the cluster without going through it.

26. Why is using latest tag dangerous in Kubernetes deployments?

latest is a mutable tag — anyone can push a new image with that tag at any time. If your Deployment uses latest, a node that pulls the image gets whatever was latest at that moment — potentially a different version than other nodes. Pin to an immutable tag (like a Git SHA or semantic version) so every node runs exactly the same image.

27. What is the difference between a DaemonSet and a Deployment?

DaemonSet ensures exactly one Pod runs on every node (or a selected subset) — used for node-level agents like log collectors, monitoring, or network plugins. Deployment manages a pool of Pods across the cluster without caring which node they land on. DaemonSet scales with nodes; Deployment scales with replicas.

28. How does a Horizontal Pod Autoscaler know when to scale?

HPA watches metrics (CPU usage, memory, custom metrics from Prometheus) and compares them to target values you define. If CPU usage exceeds 70% target across Pods, it increases replica count. If it drops well below, it scales down. It queries the metrics-server (or custom metrics adapter) and adjusts replicas automatically.

29. Why do StatefulSets require a Headless Service?

StatefulSet Pods need stable network identities — pod-0, pod-1, pod-2. A regular Service would load-balance across them randomly. A Headless Service (ClusterIP: None) creates DNS records for each Pod individually (pod-0.service.namespace.svc.cluster.local) so clients can address specific Pods directly, which databases and distributed systems require.

30. What is a PersistentVolume (PV)?

A PersistentVolume (PV) is a piece of storage provisioned by an admin or dynamically by a StorageClass — it exists in the cluster independently of Pods. It abstracts the underlying storage (NFS, AWS EBS, GCE PD) into a Kubernetes resource that can be claimed and used by Pods without them knowing the storage details.

31. What is the difference between PV and PVC?

PV (PersistentVolume) is the actual storage resource — provisioned and available in the cluster. PVC (PersistentVolumeClaim) is a request for storage from a Pod — "I need 10Gi of ReadWriteOnce storage." Kubernetes binds a matching PV to the PVC. Pods use the PVC; they don't interact with the PV directly.

32. What is a StorageClass used for?

StorageClass defines how to dynamically provision PersistentVolumes on demand. When a PVC requests storage, Kubernetes uses the StorageClass provisioner (AWS EBS, GCE PD, NFS) to automatically create a PV that matches. Without StorageClass, admins must manually pre-provision PVs — dynamic provisioning via StorageClass is the modern approach.

33. Why do Pods sometimes remain in a Terminating state indefinitely?

Pods get stuck in Terminating when a finalizer is preventing deletion — the controller responsible for removing the finalizer never does so (it might be dead or buggy). You can force-delete with kubectl delete pod --force --grace-period=0, but this bypasses cleanup logic. Investigate which finalizer is blocking before force-deleting.

34. How does Kubernetes perform rolling updates without downtime?

Rolling updates gradually replace old Pods with new ones. Kubernetes creates new Pods with the new image, waits for them to pass readiness probes, then terminates old ones — maxSurge controls how many extra Pods are created, maxUnavailable controls how many old Pods can be down at once. Traffic only routes to ready Pods throughout.

35. What is the purpose of a Pod Disruption Budget?

A Pod Disruption Budget (PDB) sets the minimum number of Pods that must stay running during voluntary disruptions (node drains, upgrades). It prevents operations like kubectl drain from taking down so many Pods that your service loses quorum or availability. Kubernetes refuses to evict Pods if it would violate the PDB.

36. Why are jobs used instead of Deployments for batch tasks?

Deployments run processes continuously and restart them on exit — wrong for batch work that should run once and complete. Jobs run a Pod to completion (exit 0) then stop — they don't restart unnecessarily. Jobs track success/failure, support parallelism (run N tasks simultaneously), and have retry logic for failures.

37. What is the role of a CronJob?

A CronJob creates Jobs on a cron schedule — runs a task every hour, every night, every Monday. It's Kubernetes' equivalent of a Unix cron job but for containerized tasks. CronJob manages the schedule; the Job manages execution and retries. Useful for database backups, report generation, cache warming, and cleanup tasks.

38. What is image pull policy and why does it matter?

Image pull policy controls when Kubernetes pulls an image from the registry: Always (pull every time — slow but always fresh), IfNotPresent (pull only if not cached — faster but may use stale images), Never (never pull — use what's cached). For production with mutable tags (like latest), Always ensures you get the latest image on each restart.

39. Why is RBAC important in Kubernetes?

RBAC controls who can do what in the cluster. Without it, any Pod with a ServiceAccount could read Secrets, create new Pods, or delete Deployments. RBAC lets you grant least-privilege access — a Pod can only read its own ConfigMap, a developer can only deploy to the staging namespace. Essential for multi-team and multi-tenant clusters.

40. What is a ServiceAccount?

A ServiceAccount is an identity for Pods — it determines what API permissions the Pod has. By default, every Pod gets the default ServiceAccount with a mounted token. Assign a specific ServiceAccount with narrowly-scoped RBAC roles to Pods that need to interact with the Kubernetes API (like operators and controllers).

41. Why disable auto-mount of ServiceAccount tokens in some Pods?

By default, Pods auto-mount a ServiceAccount token that allows API calls to the cluster. Most application Pods don't need to call the Kubernetes API — they just serve web requests or process data. Auto-mounting an unused token is an unnecessary attack surface. Disable it for workloads that don't interact with the cluster API.

42. How do Kubernetes Labels differ from Annotations?

Labels are key-value pairs used for selection — Deployments select their Pods by label, Services route to Pods by label. They're queryable and used by controllers. Annotations are key-value metadata for tools and humans — build info, CI pipeline URLs, documentation links. Annotations don't affect how Kubernetes selects or routes objects.

43. What is tainting a node and why is it used?

A taint on a node repels Pods that don't explicitly tolerate it. Use it to reserve nodes for specific workloads — taint GPU nodes so only GPU-requesting Pods land there, taint production nodes so dev Pods can't accidentally run on them. Tolerations on Pods say "I'm okay with this taint" — they can still land on tainted nodes.

44. What are Node Selectors?

Node selectors constrain which nodes a Pod can run on by matching node labels. Add nodeSelector: disktype: ssd to a Pod and it only schedules on nodes labeled disktype=ssd. Simple and effective for basic placement, but only supports exact label matching — no AND/OR logic, no preference vs requirement.

45. Why is node affinity more powerful than node selectors?

Node affinity supports complex expressions — require OR prefer, multiple conditions with AND/OR logic, and the ability to express preferences (schedule here if possible) vs hard requirements (must schedule here). It also supports scheduling based on runtime conditions (e.g., during Pod execution). Node selectors only support exact label matching.

46. What is Pod Affinity/Anti-Affinity?

Pod Affinity schedules a Pod near other Pods with specific labels — useful when two services communicate heavily and benefit from co-location. Pod Anti-Affinity spreads Pods away from each other — put replicas on different nodes so a single node failure doesn't take down all replicas. Both use topology keys (node, zone, region) to define "near" and "far."

47. Why is Ingress preferred over multiple LoadBalancer Services?

Each LoadBalancer Service provisions an external load balancer — expensive and slow to provision (cloud load balancers can take minutes and cost money). With Ingress, one external load balancer feeds one Ingress Controller, which routes to many Services based on host and path rules. Much cheaper, faster, and easier to manage at scale.

48. How does Kubernetes handle service discovery internally?

Kubernetes runs CoreDNS as the cluster DNS server. Every Service gets a DNS entry: servicename.namespace.svc.cluster.local. When a Pod resolves a Service name, CoreDNS returns the Service's ClusterIP. kube-proxy then routes traffic from the ClusterIP to a healthy Pod. No hardcoded IPs needed — just use the service name.

49. What is the difference between a Secret of type Opaque and DockerConfigJson?

Opaque type is a generic key-value Secret — base64-encoded arbitrary data like passwords and API keys. DockerConfigJson (kubernetes.io/dockerconfigjson) stores Docker registry credentials — used by kubelet when pulling private images. Kubernetes looks for imagePullSecrets on Pod specs and uses DockerConfigJson Secrets to authenticate to the registry.

50. Why avoid large ConfigMaps?

ConfigMaps are stored in etcd and mounted into Pods. Very large ConfigMaps (MBs of config files) increase etcd load, slow down watch propagation, and consume more memory on every API server that caches them. The etcd object size limit is 1.5MB by default — exceeding it causes creation to fail. Split large configs or use a dedicated config service.
Ready to level up? Start Practice