Top Kubernetes Interview Questions

Curated Kubernetes interview questions and answers across difficulty levels.

Last updated: December 02, 2025

Curated by SugharaIQ Team

Questions

155 questions

Only Code Challenges

Q1:

What problem does Kubernetes fundamentally solve compared to running containers manually?

Entry

Answer

Kubernetes automates deployment, scaling, self-healing, and networking of containers, removing manual lifecycle management.

Quick Summary: Running containers manually means you restart them manually when they crash, scale them manually when load increases, route traffic manually, and update them manually with downtime. Kubernetes automates all of this â€” it continuously maintains your desired state, reschedules failed containers, scales based on load, and rolls out updates with zero downtime.

Permalink

Q2:

What is a Kubernetes Cluster made of?

Entry

Answer

A cluster has a Control Plane that manages state and Worker Nodes that run Pods.

Quick Summary: A cluster has control plane nodes (the brain â€” API server, scheduler, etcd, controller manager) and worker nodes (the muscle â€” where your apps run). The control plane makes decisions; worker nodes execute them. In production, you typically have 3 control plane nodes for high availability and as many workers as your workload needs.

Permalink

Q3:

What is a Pod in Kubernetes and why isn’t a container scheduled directly?

Entry

Answer

A Pod is the smallest deployable unit; it abstracts networking and storage so containers inside share IP and volumes.

Quick Summary: A Pod is the smallest deployable unit â€” a wrapper around one or more tightly-coupled containers that share the same network namespace, IP, and storage volumes. Containers in a pod communicate via localhost. Kubernetes schedules and manages Pods, not individual containers, making co-located processes easier to manage as one unit.

Permalink

Q4:

What is the role of the Kubelet on every worker node?

Entry

Answer

Kubelet ensures containers match desired state and restarts them when needed.

Quick Summary: Kubelet is the node agent that talks to the API server and ensures the containers described in assigned Pods are actually running. It watches for Pod assignments, starts containers via the container runtime (containerd), runs health checks (liveness/readiness probes), and reports node and Pod status back to the control plane.

Permalink

Q5:

Why does every Pod get its own IP address?

Entry

Answer

Kubernetes uses an IP-per-Pod model to simplify routing and make Pods behave like standalone hosts.

Quick Summary: Each Pod getting its own IP makes networking simple and flat â€” Pods talk to each other directly using their IPs without NAT. There's no port mapping needed between Pods. This flat network model makes service discovery straightforward and mimics how VMs communicate, making it easy to migrate apps into Kubernetes.

Permalink

Q6:

What is a Deployment used for?

Entry

Answer

Deployments manage stateless apps with ReplicaSets, rolling updates, and rollbacks.

Quick Summary: A Deployment manages a ReplicaSet to keep a specified number of Pod replicas running. It handles rolling updates (new version deployed gradually), rollbacks (revert to a previous version), and self-healing (replace crashed Pods). It's the standard way to run stateless applications â€” web servers, APIs, background workers.

Permalink

Q7:

What is the difference between a Deployment and a StatefulSet?

Entry

Answer

Deployment is for stateless Pods. StatefulSet provides stable identity and persistent storage.

Quick Summary: Deployments are for stateless apps â€” Pods are interchangeable, created in any order, and can be killed and replaced without consequences. StatefulSets are for stateful apps (databases, message queues) â€” each Pod gets a stable hostname, stable storage, and is created/deleted in strict order. Identity matters for StatefulSets; it doesn't for Deployments.

Permalink

Q8:

What is a Service in Kubernetes?

Entry

Answer

A Service exposes Pods using stable networking and provides a ClusterIP with load balancing.

Quick Summary: A Service is a stable network endpoint that sits in front of a set of Pods. Since Pod IPs change as Pods are created and destroyed, Services provide a fixed IP and DNS name that clients connect to. Kubernetes routes traffic from the Service to healthy Pods behind it, handling load balancing automatically.

Permalink

Q9:

What does a ClusterIP Service do?

Entry

Answer

It exposes an internal-only endpoint, accessible inside the cluster.

Quick Summary: ClusterIP creates a virtual IP accessible only inside the cluster. Other Pods reach the Service by its DNS name or ClusterIP â€” Kubernetes routes the traffic to a healthy Pod. It's the default and most common Service type, used for internal service-to-service communication within the cluster.

Permalink

Q10:

Why is a NodePort Service rarely used in production?

Entry

Answer

It exposes high-numbered host ports and lacks proper load balancing.

Quick Summary: NodePort opens a port (30000â€“32767) on every node in the cluster and routes traffic to the Service. It's hard to manage at scale (all nodes expose the port, even if no relevant Pod runs there), not load-balanced at the TCP layer, and exposes internal ports publicly. In production, an Ingress Controller or LoadBalancer Service is preferred.

Permalink

Q11:

What is the purpose of an Ingress Controller?

Entry

Answer

Ingress controllers route HTTP/HTTPS traffic to services using host and path rules.

Quick Summary: An Ingress Controller (like nginx or Traefik) handles external HTTP/HTTPS traffic and routes it to internal Services based on host name and URL path rules. Instead of one LoadBalancer per Service (expensive), one Ingress Controller handles all external traffic routing. It also handles TLS termination, redirects, and path rewrites.

Permalink

Q12:

What is etcd and why is it critical to Kubernetes?

Entry

Answer

etcd stores cluster state; corruption makes the cluster unusable.

Quick Summary: etcd is the distributed key-value store that holds all cluster state â€” every object, every Pod spec, every config, every secret. It's the only stateful part of the control plane. If etcd goes down, the cluster can't accept new operations (though running workloads continue). Regular etcd backups are non-negotiable in production.

Permalink

Q13:

How does Kubernetes handle container restarts inside Pods?

Entry

Answer

Restart policies (Always, OnFailure, Never) determine behavior and are handled by kubelet.

Quick Summary: The restartPolicy (Always, OnFailure, Never) on the Pod spec controls this. kubelet monitors containers and restarts them according to the policy. The restart count increments and kubelet applies exponential backoff (up to 5 minutes) before each retry â€” preventing a crashing container from hammering resources. This is what CrashLoopBackOff shows.

Permalink

Q14:

What does desired state mean in Kubernetes?

Entry

Answer

Controllers compare current vs desired state and create, delete, or update Pods accordingly.

Quick Summary: Desired state is what you declare (3 replicas, image v2, 2 CPU requests). Kubernetes controllers continuously compare this desired state to the actual state and take actions to close the gap. If a Pod crashes, the controller creates a new one. If you scale down, it deletes extras. Kubernetes is always converging toward your desired state.

Permalink

Q15:

What is a ReplicaSet and how does it relate to Deployments?

Entry

Answer

ReplicaSet maintains a set number of Pods; Deployments manage ReplicaSets.

Quick Summary: A ReplicaSet ensures a specified number of identical Pods are running at all times. You rarely create ReplicaSets directly â€” Deployments create and manage them for you. When you update a Deployment, it creates a new ReplicaSet for the new version and scales it up while scaling down the old one â€” enabling rolling updates.

Permalink

Q16:

Why should ConfigMaps not be used for sensitive data?

Entry

Answer

ConfigMaps store data in plain text and are not secure for secrets.

Quick Summary: ConfigMaps store configuration as plain text â€” unencrypted, visible to anyone with read access to the namespace. They're designed for non-sensitive config (URLs, feature flags, env settings). Secrets are base64-encoded (not encrypted by default) but have tighter access controls and can be encrypted at rest â€” appropriate for passwords and tokens.

Permalink

Q17:

What is a Kubernetes Secret?

Entry

Answer

A Secret stores sensitive data encoded in base64; encryption at rest improves security.

Quick Summary: A Secret stores sensitive data (passwords, tokens, certificates) separately from the application image. They're base64-encoded in etcd (and can be encrypted at rest). Secrets can be mounted as files or injected as environment variables. Separating secrets from code means you don't expose them in Docker images or version control.

Permalink

Q18:

What is the purpose of a Namespace?

Entry

Answer

Namespaces logically isolate resources and help organize multi-team or multi-env clusters.

Quick Summary: Namespaces divide a cluster into logical isolation units â€” different teams, environments (dev, staging), or projects. They provide scope for names (two Pods with the same name can exist in different namespaces), resource quotas (limit how much CPU/memory a team can use), and RBAC (grant access only to specific namespaces).

Permalink

Q19:

What is a Node in Kubernetes?

Entry

Answer

A Node is a machine running Pods; it contains kubelet, kube-proxy, and container runtime.

Quick Summary: A Node is a physical or virtual machine in the cluster where Pods actually run. Each node runs kubelet (agent), kube-proxy (network), and a container runtime (containerd). Nodes provide the compute resources (CPU, memory, storage) that Pods consume. The control plane schedules Pods to nodes based on available capacity and constraints.

Permalink

Q20:

What happens when a Node becomes NotReady?

Entry

Answer

Kubernetes stops scheduling Pods on it and may evict existing Pods for safety.

Quick Summary: When a node goes NotReady, the node controller marks all its Pods as Unknown. After a timeout (default 5 minutes), Kubernetes evicts those Pods and reschedules them on healthy nodes â€” but only if they belong to a ReplicaSet, Deployment, or StatefulSet. Standalone Pods don't get rescheduled automatically.

Permalink

Q21:

How does Kubernetes ensure Pods are rescheduled automatically when a node fails?

Junior

Answer

Scheduler detects NotReady nodes, marks Pods lost, and recreates them on healthy nodes via their controllers.

Quick Summary: When a node fails, its kubelet stops sending heartbeats. The node controller marks the node NotReady and eventually evicts its Pods. Workload controllers (Deployment, ReplicaSet) notice the Pod count dropped below desired and create new Pods on healthy nodes. This is automatic â€” no human intervention needed.

Permalink

Q22:

Why do Pods restart even when their containers exit with code 0?

Junior

Answer

RestartPolicy Always forces Pod restarts regardless of exit code.

Quick Summary: The restartPolicy: Always setting restarts containers regardless of exit code â€” even 0 (clean exit). This is intentional for long-running services that should never stop. If your container exits cleanly with 0 but you don't want it restarted, use restartPolicy: OnFailure (restart only on non-zero exit) or restartPolicy: Never.

Permalink

Q23:

What is the role of kube-proxy in networking?

Junior

Answer

kube-proxy manages iptables/IPVS rules to route Service traffic to Pod endpoints.

Quick Summary: kube-proxy runs on every node and programs network rules (iptables or IPVS) that implement Service virtual IPs. When a Pod connects to a Service ClusterIP, kube-proxy's rules intercept the packet and DNAT it to a random healthy Pod backing that Service. It's how Kubernetes load-balances traffic to Services without a dedicated load balancer per Service.

Permalink

Q24:

How do readiness probes differ from liveness probes?

Junior

Answer

Readiness controls traffic routing; Liveness decides if a Pod should be restarted.

Quick Summary: Readiness probe: is this container ready to receive traffic? If it fails, the Pod is removed from the Service endpoints â€” traffic stops going to it, but it keeps running. Liveness probe: is this container alive? If it fails, kubelet kills and restarts the container. Readiness gates traffic; liveness recovers from stuck processes.

Permalink

Q25:

What is the Kubernetes API Server responsible for?

Junior

Answer

It validates requests, stores configuration, exposes REST endpoints, and communicates with etcd.

Quick Summary: The API Server is the central hub for all cluster communication â€” every component (kubelet, scheduler, controllers) reads and writes state through it. It validates and persists objects to etcd, enforces authentication and authorization, runs admission controllers, and serves the Kubernetes REST API. Nothing happens in the cluster without going through it.

Permalink

Q26:

Why is using latest tag dangerous in Kubernetes deployments?

Junior

Answer

Kubernetes cannot track version changes; rolling updates and rollbacks become unpredictable.

Quick Summary: latest is a mutable tag â€” anyone can push a new image with that tag at any time. If your Deployment uses latest, a node that pulls the image gets whatever was latest at that moment â€” potentially a different version than other nodes. Pin to an immutable tag (like a Git SHA or semantic version) so every node runs exactly the same image.

Permalink

Q27:

What is the difference between a DaemonSet and a Deployment?

Junior

Answer

Deployment runs N replicas; DaemonSet ensures one Pod per node.

Quick Summary: DaemonSet ensures exactly one Pod runs on every node (or a selected subset) â€” used for node-level agents like log collectors, monitoring, or network plugins. Deployment manages a pool of Pods across the cluster without caring which node they land on. DaemonSet scales with nodes; Deployment scales with replicas.

Permalink

Q28:

How does a Horizontal Pod Autoscaler know when to scale?

Junior

Answer

HPA monitors CPU, memory, or custom metrics and adjusts replicas.

Quick Summary: HPA watches metrics (CPU usage, memory, custom metrics from Prometheus) and compares them to target values you define. If CPU usage exceeds 70% target across Pods, it increases replica count. If it drops well below, it scales down. It queries the metrics-server (or custom metrics adapter) and adjusts replicas automatically.

Permalink

Q29:

Why do StatefulSets require a Headless Service?

Junior

Answer

Headless Services provide stable DNS entries for each Pod.

Quick Summary: StatefulSet Pods need stable network identities â€” pod-0, pod-1, pod-2. A regular Service would load-balance across them randomly. A Headless Service (ClusterIP: None) creates DNS records for each Pod individually (pod-0.service.namespace.svc.cluster.local) so clients can address specific Pods directly, which databases and distributed systems require.

Permalink

Q30:

What is a PersistentVolume (PV)?

Junior

Answer

PV is cluster storage that persists beyond Pod lifecycles.

Quick Summary: A PersistentVolume (PV) is a piece of storage provisioned by an admin or dynamically by a StorageClass â€” it exists in the cluster independently of Pods. It abstracts the underlying storage (NFS, AWS EBS, GCE PD) into a Kubernetes resource that can be claimed and used by Pods without them knowing the storage details.

Permalink

Q31:

What is the difference between PV and PVC?

Junior

Answer

PV is actual storage; PVC is user request bound to a matching PV.

Quick Summary: PV (PersistentVolume) is the actual storage resource â€” provisioned and available in the cluster. PVC (PersistentVolumeClaim) is a request for storage from a Pod â€” "I need 10Gi of ReadWriteOnce storage." Kubernetes binds a matching PV to the PVC. Pods use the PVC; they don't interact with the PV directly.

Permalink

Q32:

What is a StorageClass used for?

Junior

Answer

It defines dynamic provisioning of volumes using provisioners and parameters.

Quick Summary: StorageClass defines how to dynamically provision PersistentVolumes on demand. When a PVC requests storage, Kubernetes uses the StorageClass provisioner (AWS EBS, GCE PD, NFS) to automatically create a PV that matches. Without StorageClass, admins must manually pre-provision PVs â€” dynamic provisioning via StorageClass is the modern approach.

Permalink

Q33:

Why do Pods sometimes remain in a Terminating state indefinitely?

Junior

Answer

Due to stuck finalizers, runtime issues, volume problems, or network partitions.

Quick Summary: Pods get stuck in Terminating when a finalizer is preventing deletion â€” the controller responsible for removing the finalizer never does so (it might be dead or buggy). You can force-delete with kubectl delete pod --force --grace-period=0, but this bypasses cleanup logic. Investigate which finalizer is blocking before force-deleting.

Permalink

Q34:

How does Kubernetes perform rolling updates without downtime?

Junior

Answer

It creates new Pods before terminating old ones, controlled by maxSurge and maxUnavailable.

Quick Summary: Rolling updates gradually replace old Pods with new ones. Kubernetes creates new Pods with the new image, waits for them to pass readiness probes, then terminates old ones â€” maxSurge controls how many extra Pods are created, maxUnavailable controls how many old Pods can be down at once. Traffic only routes to ready Pods throughout.

Permalink

Q35:

What is the purpose of a Pod Disruption Budget?

Junior

Answer

PDB ensures minimum available replicas during voluntary disruptions.

Quick Summary: A Pod Disruption Budget (PDB) sets the minimum number of Pods that must stay running during voluntary disruptions (node drains, upgrades). It prevents operations like kubectl drain from taking down so many Pods that your service loses quorum or availability. Kubernetes refuses to evict Pods if it would violate the PDB.

Permalink

Q36:

Why are jobs used instead of Deployments for batch tasks?

Junior

Answer

Jobs ensure tasks run to completion, retrying on failure.

Quick Summary: Deployments run processes continuously and restart them on exit â€” wrong for batch work that should run once and complete. Jobs run a Pod to completion (exit 0) then stop â€” they don't restart unnecessarily. Jobs track success/failure, support parallelism (run N tasks simultaneously), and have retry logic for failures.

Permalink

Q37:

What is the role of a CronJob?

Junior

Answer

CronJobs schedule recurring jobs based on cron expressions.

Quick Summary: A CronJob creates Jobs on a cron schedule â€” runs a task every hour, every night, every Monday. It's Kubernetes' equivalent of a Unix cron job but for containerized tasks. CronJob manages the schedule; the Job manages execution and retries. Useful for database backups, report generation, cache warming, and cleanup tasks.

Permalink

Q38:

What is image pull policy and why does it matter?

Junior

Answer

It controls when Kubernetes pulls images; misconfiguration leads to stale or missing images.

Quick Summary: Image pull policy controls when Kubernetes pulls an image from the registry: Always (pull every time â€” slow but always fresh), IfNotPresent (pull only if not cached â€” faster but may use stale images), Never (never pull â€” use what's cached). For production with mutable tags (like latest), Always ensures you get the latest image on each restart.

Permalink

Q39:

Why is RBAC important in Kubernetes?

Junior

Answer

RBAC restricts actions to authorized users, preventing unauthorized access.

Quick Summary: RBAC controls who can do what in the cluster. Without it, any Pod with a ServiceAccount could read Secrets, create new Pods, or delete Deployments. RBAC lets you grant least-privilege access â€” a Pod can only read its own ConfigMap, a developer can only deploy to the staging namespace. Essential for multi-team and multi-tenant clusters.

Permalink

Q40:

What is a ServiceAccount?

Junior

Answer

A ServiceAccount provides identity for Pods to authenticate to the API server.

Quick Summary: A ServiceAccount is an identity for Pods â€” it determines what API permissions the Pod has. By default, every Pod gets the default ServiceAccount with a mounted token. Assign a specific ServiceAccount with narrowly-scoped RBAC roles to Pods that need to interact with the Kubernetes API (like operators and controllers).

Permalink

Q41:

Why disable auto-mount of ServiceAccount tokens in some Pods?

Junior

Answer

Pods without API needs should not get credentials to reduce risk.

Quick Summary: By default, Pods auto-mount a ServiceAccount token that allows API calls to the cluster. Most application Pods don't need to call the Kubernetes API â€” they just serve web requests or process data. Auto-mounting an unused token is an unnecessary attack surface. Disable it for workloads that don't interact with the cluster API.

Permalink

Q42:

How do Kubernetes Labels differ from Annotations?

Junior

Answer

Labels are used for selection; annotations store metadata.

Quick Summary: Labels are key-value pairs used for selection â€” Deployments select their Pods by label, Services route to Pods by label. They're queryable and used by controllers. Annotations are key-value metadata for tools and humans â€” build info, CI pipeline URLs, documentation links. Annotations don't affect how Kubernetes selects or routes objects.

Permalink

Q43:

What is tainting a node and why is it used?

Junior

Answer

Taints prevent scheduling unless Pods have tolerations.

Quick Summary: A taint on a node repels Pods that don't explicitly tolerate it. Use it to reserve nodes for specific workloads â€” taint GPU nodes so only GPU-requesting Pods land there, taint production nodes so dev Pods can't accidentally run on them. Tolerations on Pods say "I'm okay with this taint" â€” they can still land on tainted nodes.

Permalink

Q44:

What are Node Selectors?

Junior

Answer

They restrict scheduling to nodes with specific labels.

Quick Summary: Node selectors constrain which nodes a Pod can run on by matching node labels. Add nodeSelector: disktype: ssd to a Pod and it only schedules on nodes labeled disktype=ssd. Simple and effective for basic placement, but only supports exact label matching â€” no AND/OR logic, no preference vs requirement.

Permalink

Q45:

Why is node affinity more powerful than node selectors?

Junior

Answer

Node affinity supports expressions and preferred schedules.

Quick Summary: Node affinity supports complex expressions â€” require OR prefer, multiple conditions with AND/OR logic, and the ability to express preferences (schedule here if possible) vs hard requirements (must schedule here). It also supports scheduling based on runtime conditions (e.g., during Pod execution). Node selectors only support exact label matching.

Permalink

Q46:

What is Pod Affinity/Anti-Affinity?

Junior

Answer

Affinity co-locates Pods; anti-affinity spreads them for HA.

Quick Summary: Pod Affinity schedules a Pod near other Pods with specific labels â€” useful when two services communicate heavily and benefit from co-location. Pod Anti-Affinity spreads Pods away from each other â€” put replicas on different nodes so a single node failure doesn't take down all replicas. Both use topology keys (node, zone, region) to define "near" and "far."

Permalink

Q47:

Why is Ingress preferred over multiple LoadBalancer Services?

Junior

Answer

Ingress consolidates routing and reduces cost.

Quick Summary: Each LoadBalancer Service provisions an external load balancer â€” expensive and slow to provision (cloud load balancers can take minutes and cost money). With Ingress, one external load balancer feeds one Ingress Controller, which routes to many Services based on host and path rules. Much cheaper, faster, and easier to manage at scale.

Permalink

Q48:

How does Kubernetes handle service discovery internally?

Junior

Answer

It uses DNS-based discovery and kube-proxy load balancing.

Quick Summary: Kubernetes runs CoreDNS as the cluster DNS server. Every Service gets a DNS entry: servicename.namespace.svc.cluster.local. When a Pod resolves a Service name, CoreDNS returns the Service's ClusterIP. kube-proxy then routes traffic from the ClusterIP to a healthy Pod. No hardcoded IPs needed â€” just use the service name.

Permalink

Q49:

What is the difference between a Secret of type Opaque and DockerConfigJson?

Junior

Answer

Opaque stores key-values; DockerConfigJson stores registry credentials.

Quick Summary: Opaque type is a generic key-value Secret â€” base64-encoded arbitrary data like passwords and API keys. DockerConfigJson (kubernetes.io/dockerconfigjson) stores Docker registry credentials â€” used by kubelet when pulling private images. Kubernetes looks for imagePullSecrets on Pod specs and uses DockerConfigJson Secrets to authenticate to the registry.

Permalink

Q50:

Why avoid large ConfigMaps?

Junior

Answer

Large ConfigMaps exceed API limits and slow Pod startup.

Quick Summary: ConfigMaps are stored in etcd and mounted into Pods. Very large ConfigMaps (MBs of config files) increase etcd load, slow down watch propagation, and consume more memory on every API server that caches them. The etcd object size limit is 1.5MB by default â€” exceeding it causes creation to fail. Split large configs or use a dedicated config service.

Permalink

Q51:

Why might a Pod stay in Pending state indefinitely?

Junior

Answer

Due to insufficient resources, PVC issues, taints, or affinity mismatch.

Quick Summary: Pods stay Pending when: no node has enough resources (insufficient CPU/memory), no node matches Pod's nodeSelector or affinity rules, all matching nodes have taints the Pod doesn't tolerate, PVCs can't be bound (no matching PV), or an image pull limit is blocking. Describe the Pod (kubectl describe pod) to see which scheduler condition is failing.

Permalink

Q52:

Why avoid using hostPath in production?

Junior

Answer

hostPath ties Pods to nodes, risks corruption, and reduces portability.

Quick Summary: hostPath mounts a directory from the host node's filesystem into the Pod. This creates a tight coupling between the Pod and the specific node it runs on (non-portable). It bypasses storage abstractions, allows Pods to read/write sensitive host files, and creates a security risk. Use PVCs with proper StorageClasses for persistent storage instead.

Permalink

Q53:

How does Kubernetes prevent controller conflicts on resources?

Junior

Answer

Declarative reconciliation ensures only the owning controller manages the resource.

Quick Summary: Kubernetes uses optimistic concurrency with resourceVersion â€” each object has a version that increments on every write. Controllers must include the current resourceVersion when updating â€” if two controllers try to update the same object simultaneously, only one wins (the other gets a conflict error and must retry). No distributed lock needed.

Permalink

Q54:

What happens when you edit a Pod directly instead of its Deployment?

Junior

Answer

Deployment overwrites changes by recreating Pods with original spec.

Quick Summary: You can't â€” running Pods are immutable. kubectl edit or kubectl patch on a Pod only changes mutable fields (like labels). The Pod spec (containers, images) can't be changed on a running Pod. To update a Pod, you update its Deployment or ReplicaSet â€” Kubernetes creates new Pods with the updated spec and removes the old ones.

Permalink

Q55:

Why should readiness checks be mandatory for production Deployments?

Junior

Answer

Readiness prevents routing traffic to uninitialized Pods.

Quick Summary: Without readiness probes, Kubernetes assumes a container is ready as soon as it starts â€” but the app might still be initializing (loading cache, connecting to DB). Traffic sent to an unready Pod returns errors. A readiness probe ensures traffic only reaches Pods that are actually ready to serve. Critical for zero-downtime deployments.

Permalink

Q56:

How does the Kubernetes Scheduler decide the best node for a Pod?

Mid

Answer

Scheduler evaluates nodes via filters and scoring. Predicates check resource availability, taints, affinity; priorities score nodes and top-scored node is selected.

Quick Summary: The scheduler filters nodes (eliminates those that fail hard requirements â€” resources, taints, affinity), then scores the remaining nodes (prefers nodes with most available resources, matching preferred affinity, spreading Pods evenly). It picks the highest-scoring node and binds the Pod to it by writing to the API server.

Permalink

Q57:

Why do Pods sometimes get terminated with OOMKilled even when free host memory exists?

Mid

Answer

Pods run under cgroup memory limits; exceeding the limit triggers kernel OOM kill regardless of host memory.

Quick Summary: Kubernetes enforces memory limits via cgroups per Pod (container memory.limit_in_bytes). The OOM killer operates at the cgroup level â€” it can kill a container even if the host has free memory, because the container's memory usage exceeded its cgroup limit. This is intentional â€” limits are hard boundaries for the container, not the host.

Permalink

Q58:

What is the role of kube-controller-manager?

Mid

Answer

It runs controllers like Deployment, ReplicaSet, Node lifecycle, Job, and HPA to maintain desired state.

Quick Summary: kube-controller-manager runs all the core controllers â€” Node controller (detects node failures), ReplicaSet controller (maintains Pod counts), Deployment controller (manages rolling updates), Endpoint controller (updates Service endpoints), and many more. Each controller watches relevant resources and takes action to maintain desired state.

Permalink

Q59:

How does Kubernetes handle split-brain scenarios in multi-master clusters?

Mid

Answer

API servers rely on etcd quorum; without quorum, writes stop to avoid inconsistent cluster state.

Quick Summary: Kubernetes uses etcd's Raft consensus â€” writes are only committed when a majority (quorum) of etcd members agree. With 3 control plane nodes, 2 must agree. If the network splits into two halves, the side without quorum can't accept writes â€” it becomes read-only. This prevents split-brain by design. Three or five control plane nodes ensure quorum survives one or two failures.

Permalink

Q60:

Why is direct access to etcd discouraged for debugging?

Mid

Answer

etcd stores raw cluster state; manual edits may corrupt the cluster. API server is the safe interface.

Quick Summary: etcd is the source of truth for the cluster â€” all Kubernetes state is stored there. Direct etcd access bypasses API server validation, admission controllers, and RBAC. You can corrupt cluster state with malformed writes or accidentally expose secrets. Use kubectl and the API server for all interactions; etcdctl is for backup/restore only.

Permalink

Q61:

What is the difference between soft and hard eviction in Kubernetes?

Mid

Answer

Soft eviction allows graceful shutdown; hard eviction force-kills Pods when thresholds exceed.

Quick Summary: Soft eviction: node is under resource pressure (low memory, low disk) â€” kubelet starts gracefully evicting lower-priority Pods to free resources. The node stays schedulable. Hard eviction: resources are critically low â€” kubelet aggressively evicts Pods without grace periods to prevent the node from becoming completely unusable.

Permalink

Q62:

How do topology spread constraints improve availability?

Mid

Answer

They distribute Pods across zones/nodes/racks to avoid localized failures.

Quick Summary: Topology spread constraints distribute Pods evenly across zones, nodes, or regions. Instead of all replicas landing on one zone (risky if that zone fails), you declare maxSkew: 1 across topology zones â€” Kubernetes ensures replicas spread as evenly as possible. Combines with Pod anti-affinity but is more flexible and declarative.

Permalink

Q63:

Why does Kubernetes recommend using readiness gates for external dependency checks?

Mid

Answer

Readiness gates delay traffic until apps confirm external dependencies are healthy.

Quick Summary: External dependency readiness gates let you declare that a Pod isn't ready until an external condition is met â€” beyond just the Pod's own health check. For example, a Pod might be healthy internally but not ready until a sidecar (like a service mesh proxy) finishes initializing. Readiness gates add a programmatic external condition to the standard readiness probe.

Permalink

Q64:

What is the difference between Service sessionAffinity ClientIP vs None?

Mid

Answer

ClientIP keeps the same Pod for a client; None performs round-robin load balancing.

Quick Summary: ClientIP: all requests from the same client IP go to the same Pod (sticky sessions) â€” useful for stateful protocols. None: each request can go to any healthy Pod â€” better load distribution. ClientIP is implemented via iptables DNAT rules that track client IP affinity with a configurable timeout (default 10800s).

Permalink

Q65:

How does kube-proxy operate in IPVS mode vs iptables mode?

Mid

Answer

IPVS uses kernel-level load balancing, faster for large clusters; iptables uses rule chains.

Quick Summary: iptables mode: for each Service, kube-proxy adds chains of iptables rules that DNAT traffic. With many Services, traversing many chains per packet adds latency and CPU overhead. IPVS mode: uses the kernel's IPVS module (L4 load balancer) â€” lookup is O(1) via a hash table instead of O(n) through iptables chains. Much faster at 10,000+ Services.

Permalink

Q66:

Why do HPA and Cluster Autoscaler sometimes conflict?

Mid

Answer

HPA increases Pods requiring more nodes; autoscaler reacts slower, causing scaling thrashing.

Quick Summary: HPA scales Pods based on current load; Cluster Autoscaler adds nodes when Pods are Pending (no room to schedule). Conflict: HPA scales up Pods â†’ Cluster Autoscaler adds a node â†’ node comes up â†’ HPA scales down â†’ Cluster Autoscaler tries to remove the underutilized node â†’ HPA sees load spike again. Tune cooldown windows and min/max bounds to dampen oscillation.

Permalink

Q67:

What is the impact of using hostNetwork true in Pods?

Mid

Answer

Pods share node network namespace, risking port conflicts and weaker isolation.

Quick Summary: hostNetwork: true puts the Pod directly on the host's network namespace â€” no isolation. The Pod sees and can reach everything the host sees. Multiple Pods on the same node can't use the same port. If compromised, the attacker has direct network access to the host environment. Only justified for very specific system-level network workloads.

Permalink

Q68:

Why does Pod startup time increase when ConfigMaps or Secrets grow large?

Mid

Answer

Large data mounts slow kubelet volume setup and Pod initialization.

Quick Summary: Kubernetes projects ConfigMaps and Secrets into Pods via the API server. Kubelet watches these resources and syncs them into the Pod. Large ConfigMaps mean more data to transfer, more etcd I/O per watch event, and more time for kubelet to process and write files into the container. The sync loop adds latency proportional to data size.

Permalink

Q69:

Why can a Deployment have multiple ReplicaSets simultaneously?

Mid

Answer

Rolling updates retain old ReplicaSets for rollback until scaled to zero.

Quick Summary: During a rolling update, the old ReplicaSet scales down while the new one scales up â€” both coexist temporarily. Additionally, Kubernetes keeps the last N ReplicaSets (controlled by revisionHistoryLimit, default 10) to support rollbacks. kubectl rollout undo switches back to the previous ReplicaSet instantly.

Permalink

Q70:

What is the significance of Pod Priority Classes?

Mid

Answer

Higher-priority Pods can preempt lower ones, ensuring critical workloads get scheduled.

Quick Summary: Priority Classes define the importance of Pods. Higher-priority Pods can preempt (evict) lower-priority ones to get scheduled when the cluster is full. System components run at the highest priority. Batch jobs and non-critical workloads run at lower priority. This ensures critical services always get resources, even under cluster pressure.

Permalink

Q71:

What is the difference between eviction due to node pressure and preemption?

Mid

Answer

Eviction removes Pods for node stability; preemption removes Pods for priority scheduling.

Quick Summary: Node pressure eviction: kubelet proactively evicts Pods because the node is running low on memory or disk â€” it removes the lowest-priority Pods to keep the node healthy. Preemption: the scheduler evicts lower-priority Pods on a node to make room for a higher-priority Pod that couldn't fit. Eviction is reactive; preemption is scheduled.

Permalink

Q72:

What causes image pull backoff errors beyond missing images?

Mid

Answer

Caused by bad credentials, DNS issues, digest mismatch, rate limits, or broken CNI.

Quick Summary: ImagePullBackOff occurs when: the image doesn't exist or the tag is wrong, the registry requires authentication but no imagePullSecret is configured, the registry is rate-limiting (Docker Hub free tier), the node can't reach the registry (network/firewall issue), or the registry itself is down. kubectl describe pod shows the exact error message.

Permalink

Q73:

How does Kubernetes handle network policies internally?

Mid

Answer

CNI plugins enforce policies using iptables or eBPF based on Pod labels.

Quick Summary: NetworkPolicies are enforced by the CNI plugin (not kube-proxy or the kernel directly). The CNI plugin (Calico, Cilium, Weave) watches NetworkPolicy objects via the API server and programs iptables, eBPF, or OVS rules on each node accordingly. By default, no NetworkPolicies means all traffic is allowed â€” policies are additive restrictions.

Permalink

Q74:

Why is disabling swap required for kubelet?

Mid

Answer

Swap breaks kubelet memory accounting, causing unpredictable scheduling and OOM behavior.

Quick Summary: Swap changes memory pressure behavior in ways that break Kubernetes' resource accounting. With swap, the kernel can move memory pages to disk silently â€” a Pod appears to have "used" memory but it's actually on disk, making resource limits unreliable. Kubernetes' OOM killer and eviction logic assume in-memory resources are truly in memory.

Permalink

Q75:

How does Kubernetes guarantee consistent Pod identity in StatefulSets?

Mid

Answer

Pods get stable names, PVCs, and ordinals that persist across restarts.

Quick Summary: StatefulSet Pods get stable, predictable names: app-0, app-1, app-2. Each gets its own PersistentVolumeClaim (volumeClaimTemplate). Even after restarts or rescheduling, app-0 always gets the same PVC with the same data. The Headless Service gives each Pod a stable DNS name. This identity stability is what databases need for leader election and replication.

Permalink

Q76:

Why do terminating Pods still receive traffic sometimes?

Mid

Answer

Service endpoints update slowly; readiness checks and graceful shutdown reduce traffic leakage.

Quick Summary: When a Pod is deleted, it's removed from the endpoint slice immediately â€” but kube-proxy has a slight propagation delay before updating iptables rules. During that window, Services still route some traffic to the terminating Pod. Setting terminationGracePeriodSeconds and adding a pre-stop sleep hook ensures the Pod drains in-flight requests before the process actually stops.

Permalink

Q77:

What causes Pods to get stuck in CrashLoopBackOff?

Mid

Answer

Repeated startup failures due to misconfigurations, missing dependencies, or readiness issues.

Quick Summary: CrashLoopBackOff means the container starts, crashes, kubelet restarts it, it crashes again â€” repeatedly. Root causes: application error (bug, missing config, wrong startup command), missing environment variables or Secrets, out-of-memory kill, permission issues, or dependency (like a database) not being available. Check kubectl logs and kubectl describe pod for the actual error.

Permalink

Q78:

Why is API throttling important in large clusters?

Mid

Answer

Throttling prevents overload from noisy clients, ensuring API server stability.

Quick Summary: In large clusters, many controllers, kubelet heartbeats, watches, and client requests all hit the API server simultaneously. Without throttling (rate limiting), a surge of requests could overwhelm it. API Priority and Fairness (APF) queues requests by priority and limits each client's throughput â€” preventing one noisy controller from starving critical system operations.

Permalink

Q79:

What is the difference between Ingress and Gateway API?

Mid

Answer

Gateway API offers richer routing and team isolation; Ingress is simpler and older.

Quick Summary: Ingress is an older API with limited routing capabilities â€” host-based and path-based routing only, TLS termination, vendor-specific annotations for extra features. Gateway API is the newer, more expressive replacement â€” supports traffic splitting, header matching, backend weights, and multiple protocols natively. Gateway API is now GA and recommended for new deployments.

Permalink

Q80:

Why do readiness probe misconfigurations cause cascading failures?

Mid

Answer

Pods keep entering/exiting load balancer rotation, destabilizing traffic.

Quick Summary: A failing readiness probe removes the Pod from Service endpoints â€” it stops receiving new traffic. If many Pods fail readiness simultaneously (e.g., due to a dependency like a DB being slow), the Service routes all traffic to the few remaining ready Pods â€” overloading them, causing them to fail readiness too, cascading until the service is down.

Permalink

Q81:

How does Kubernetes handle kernel upgrades or node reboots gracefully?

Mid

Answer

Cordon, drain, and PDB ensure Pods migrate safely before reboot.

Quick Summary: For node reboots during upgrades, use kubectl drain to cordon the node (stop new scheduling), evict existing Pods gracefully (respecting PDBs), perform maintenance, then uncordon to bring it back. PDB prevents draining from taking down more replicas than you can afford to lose. Tools like kured automate this process for kernel upgrades requiring reboots.

Permalink

Q82:

What is the difference between nodeSelector, nodeAffinity, and topologySpreadConstraints?

Mid

Answer

nodeSelector = basic match; nodeAffinity = expressions; spread constraints distribute Pods.

Quick Summary: nodeSelector: simple label matching â€” must be on a node with this exact label. nodeAffinity: expressive â€” required vs preferred, multiple conditions, AND/OR logic. topologySpreadConstraints: distribute evenly â€” "spread my Pods across zones, max 1 skew" â€” no hard node targeting but ensures balanced distribution across a topology dimension.

Permalink

Q83:

Why are Init Containers important?

Mid

Answer

They run setup logic before main containers start.

Quick Summary: Init containers run to completion before the main container starts. They're used for: waiting for a database to be ready before the app starts, running schema migrations, downloading config files, setting up permissions. If an init container fails, Kubernetes restarts the Pod. This decouples initialization from the main app logic cleanly.

Permalink

Q84:

Why does scaling StatefulSets too quickly cause issues?

Mid

Answer

StatefulSets require ordered Pod creation; fast scaling causes readiness delays.

Quick Summary: StatefulSets create and scale Pods sequentially (app-0, then app-1, then app-2) and each Pod must be Running and Ready before the next starts. Scaling quickly means you're waiting for each Pod to initialize, connect to peers, and join the cluster (for distributed DBs, this is critical). Scaling too fast can cause split-brain or replication lag.

Permalink

Q85:

Why are cluster upgrades risky without version skew awareness?

Mid

Answer

Components allow limited version skew; mismatches destabilize clusters.

Quick Summary: Kubernetes enforces strict version skew limits between components â€” typically no more than 2 minor versions between API server and kubelet. Upgrading the API server too far ahead of the kubelet can cause incompatibilities where the kubelet can't process new API features. Upgrade control plane first, then nodes, one version at a time.

Permalink

Q86:

How does the Kubernetes API Server enforce optimistic concurrency using resourceVersion?

Senior

Answer

Every object has a resourceVersion. Outdated writes are rejected with 409 Conflict, enforcing optimistic locking.

Quick Summary: Every Kubernetes object has a resourceVersion â€” a string that changes on every write. When a controller reads an object and wants to update it, it must include the same resourceVersion. If another writer already changed it, the resourceVersion won't match and the API server returns 409 Conflict. The controller must re-read and retry. This prevents lost updates without distributed locks.

Permalink

Q87:

Why does etcd store all Kubernetes objects as flat key-value pairs instead of hierarchical tree structures?

Senior

Answer

Flat key-space enables fast prefix scans, simple watches, and predictable performance.

Quick Summary: Raft requires that all state machine updates go through a leader and be replicated to a majority. A hierarchical tree structure with parent-child relationships would require multi-key transactions for atomic updates â€” complex and slow in Raft. A flat key-value model means each object is one key â€” single-key reads and writes are atomic and simple to replicate.

Permalink

Q88:

How do Kubernetes Watches avoid missing updates during network interruptions?

Senior

Answer

Clients reconnect using last resourceVersion; API sends all missed events.

Quick Summary: Kubernetes Watches use etcd's watch mechanism with a bookmark system. After reconnection, clients send their last received resourceVersion â€” etcd replays all events since that version. If the version is too old (compacted away), the API server returns a "too old resource version" error, triggering a full re-list. Etcd compaction is configured with a retention window to balance history vs storage.

Permalink

Q89:

Why can excessive CRDs degrade API Server performance?

Senior

Answer

Each CRD adds endpoints, conversion, validation, and storage overhead on API server and etcd.

Quick Summary: Each CRD registers new API endpoints and type schemas with the API server. The API server must maintain OpenAPI schemas for all CRDs, validate all objects against them, and serve discovery documents listing all available types. Hundreds of CRDs from many operators add significant memory overhead to the API server and slow down discovery endpoint responses.

Permalink

Q90:

How does the scheduler use informer caches instead of querying the API Server directly?

Senior

Answer

Scheduler listens to watch events and maintains local cache for fast decision-making.

Quick Summary: The scheduler uses local informer caches (SharedInformer) that maintain a local copy of all nodes and Pods, updated via watches from the API server. Filtering and scoring run against this in-memory cache â€” no API calls per scheduling cycle. This makes scheduling fast and prevents scheduler load from overwhelming the API server in large clusters.

Permalink

Q91:

Why is kube-apiserver stateless even though it controls critical operations?

Senior

Answer

All state lives in etcd; API servers scale horizontally without syncing state.

Quick Summary: kube-apiserver stores no persistent state itself â€” all state is in etcd. This makes API server stateless and horizontally scalable. Run 3 or more API server replicas behind a load balancer and the cluster handles API server failures transparently. Losing one API server instance doesn't lose any data or interrupt running workloads.

Permalink

Q92:

How does etcd use Raft to guarantee strong consistency across master nodes?

Senior

Answer

Raft elects a leader; writes go to leader, replicated to followers, committed on majority ACK.

Quick Summary: Raft elects a leader â€” all writes go through the leader. The leader replicates the entry to followers. Once a majority acknowledges, the leader commits. Reads can also be served from the leader (linearizable reads) or with lease-based reads from followers. If the leader dies, a new election ensures only one leader exists at a time â€” no split-brain.

Permalink

Q93:

What scenarios cause Kubernetes components to enter thundering herd behavior?

Senior

Answer

Many controllers react to same event simultaneously, flooding API server.

Quick Summary: Thundering herd happens when many components simultaneously reconnect to the API server after a network partition or API server restart â€” all sending list requests at once. This floods the API server with expensive list operations. Kubernetes uses exponential backoff with jitter in client reconnection logic to spread reconnections over time.

Permalink

Q94:

Why do CNI plugins often require a dedicated MTU configuration?

Senior

Answer

Encapsulation reduces payload size; wrong MTU causes fragmentation or packet drops.

Quick Summary: VXLAN encapsulation adds overhead bytes to each packet. If the CNI plugin doesn't set the MTU correctly (accounting for the VXLAN header), packets get fragmented â€” split into multiple smaller packets. Fragmentation significantly reduces network throughput and increases latency. Each CNI plugin must be configured with the correct MTU for the underlying network.

Permalink

Q95:

What is the difference between kube-proxy iptables and IPVS modes under high traffic?

Senior

Answer

iptables does linear rule matching; IPVS uses kernel hash tables for large-scale performance.

Quick Summary: iptables mode programs a chain of rules per Service. With thousands of Services, each packet traverses potentially thousands of iptables rules sequentially â€” O(n) lookup. IPVS uses kernel-level hash tables for Service lookup â€” O(1) regardless of Service count. At 10,000+ Services, iptables mode causes measurable packet processing latency; IPVS does not.

Permalink

Q96:

Why can misconfigured readiness probes cripple an entire microservice chain?

Senior

Answer

Pods oscillate Ready/NotReady causing upstream retries and cascading failures.

Quick Summary: If a readiness probe is too strict (fails during startup, uses wrong endpoint, wrong port), Pods never become ready â€” traffic never reaches them. The Service endpoint list empties. Worse, if it flaps intermittently, the Service constantly adds and removes endpoints, causing connection errors. Always test readiness probes in isolation before deploying.

Permalink

Q97:

What is the purpose of the garbage collector controller in Kubernetes?

Senior

Answer

It cleans dependents using ownerReferences when parent objects are deleted.

Quick Summary: The garbage collector controller watches all Kubernetes objects and deletes orphaned ones. When a Deployment is deleted, the garbage collector also deletes its ReplicaSets and then their Pods, using owner references. Without it, deleting a Deployment would leave behind orphaned ReplicaSets and Pods consuming resources indefinitely.

Permalink

Q98:

How do finalizers prevent premature resource deletion?

Senior

Answer

Finalizers block deletion until cleanup is complete; controllers remove finalizers afterward.

Quick Summary: Finalizers are strings added to an object's metadata.finalizers list. A delete request sets deletionTimestamp but doesn't actually delete the object â€” the object lingers until all finalizers are removed. Controllers (often operators) watch for this state, perform cleanup tasks (like deleting external cloud resources), then remove their finalizer, allowing the object to be fully deleted.

Permalink

Q99:

Why should autoscaler and HPA cooldown windows be tuned together?

Senior

Answer

Mismatched timings cause scaling oscillations and thrashing.

Quick Summary: HPA scales up immediately when metrics cross a threshold, but takes time to scale back down (default stabilization window). If the Cluster Autoscaler removes nodes too quickly after scale-down, then HPA scales up again before new nodes are ready â€” causing scheduling failures. Tune HPA scale-down stabilization (default 5 minutes) and Cluster Autoscaler cooldown to match your app's traffic patterns.

Permalink

Q100:

Why is API aggregation essential for large enterprises?

Senior

Answer

It enables custom APIs behind API server without modifying Kubernetes core.

Quick Summary: API aggregation lets custom API servers (like metrics-server, service-catalog, or CRD-based operators) register as extensions of the main Kubernetes API server. Clients see them as part of the standard API â€” no separate endpoint to target. Enterprises use this to extend Kubernetes with domain-specific APIs without forking the core API server.

Permalink

Q101:

How does kubelet perform node-level Pod lifecycle management?

Senior

Answer

Kubelet ensures Pod state, manages cgroups, probes, logs, volumes, and interacts with runtime.

Quick Summary: kubelet watches Pod assignments from the API server, calls the container runtime (via CRI) to create containers, monitors container health via probes, manages volume mounts and secrets injection, reports Pod status back to the API server, and evicts Pods when the node is under resource pressure. It's the full Pod lifecycle manager at the node level.

Permalink

Q102:

Why does kubelet sometimes refuse to start new Pods even when resources appear available?

Senior

Answer

Kubelet reserves system resources (OS, eviction thresholds, kernel memory).

Quick Summary: Kubelet enforces node-level resource reservations (kube-reserved, system-reserved). Even if the node has free allocatable capacity, if a Pod's requests exceed the node's allocatable resources, kubelet rejects it. Also, if the node has reached its max-pods limit, or if the image can't be pulled, or if required volumes aren't available, kubelet refuses the Pod.

Permalink

Q103:

How do StatefulSets maintain the order of Pod creation and termination?

Senior

Answer

They enforce strict ordinal rules; Pod N+1 waits for Pod N to become Ready.

Quick Summary: StatefulSets create Pods in order (app-0, app-1, ...) and each must be Running and Ready before the next is created. They terminate Pods in reverse order (app-2, app-1, app-0). This ordered lifecycle is critical for distributed systems that elect leaders or require peers to join one at a time. The Pod Management Policy can be set to Parallel for workloads that don't need ordering.

Permalink

Q104:

Why is multi-zone cluster topology crucial for HA?

Senior

Answer

Distribution across zones prevents outages from zone-wide failures.

Quick Summary: Multi-zone topology spreads Pods across availability zones â€” a cloud zone failure (power outage, network partition) only takes down Pods in one zone. With Pods in 3 zones, you survive any single zone failure with 2/3 capacity. Without zone spreading, all Pods could land in one zone â€” a single failure brings down everything.

Permalink

Q105:

Why do some workloads require pod-level anti-affinity instead of PDB?

Senior

Answer

Anti-affinity prevents co-location to reduce correlated failures; PDB does not control placement.

Quick Summary: PDB protects against voluntary disruptions (drain, eviction). During node failures (involuntary), PDB doesn't apply â€” Kubernetes terminates the node's Pods regardless. Pod anti-affinity ensures replicas land on different nodes, so a node failure only kills one replica. Anti-affinity is the HA strategy; PDB is the controlled-maintenance strategy.

Permalink

Q106:

What causes slow pod scheduling when thousands of Pods are deployed?

Senior

Answer

Large node count, expensive scoring, cache churn, and many unschedulable Pods cause delays.

Quick Summary: Scheduling each Pod requires the scheduler to filter and score all eligible nodes. With thousands of Pods, the node list is long and the work per scheduling cycle is high. Without percentage-of-nodes-to-score optimization, every Pod evaluates every node. Large clusters should tune percentageOfNodesToScore to sample instead of evaluating all nodes for non-critical scheduling decisions.

Permalink

Q107:

Why is etcd compaction required?

Senior

Answer

Old revisions slow performance; compaction removes stale history.

Quick Summary: etcd stores all historical revisions of objects using MVCC. Over time, millions of old revisions accumulate, growing the etcd database size indefinitely. Compaction discards old revisions, keeping only the most recent. Without regular compaction, etcd db size grows until it hits the size quota (default 8GB), after which it refuses all writes â€” cluster-wide outage.

Permalink

Q108:

How does Kubernetes guarantee ordering of updates for the same resource?

Senior

Answer

etcd linearizable reads + monotonically increasing resourceVersion ensure ordered updates.

Quick Summary: The API server processes updates to the same object sequentially because etcd is the serialization point â€” only one write can succeed per resourceVersion. All updates go through etcd's Raft leader, which applies them in order. Watch events are delivered in revision order to all watchers, ensuring controllers always see a consistent, ordered history of changes.

Permalink

Q109:

Why should operators avoid running user workloads on control-plane nodes?

Senior

Answer

User Pods consume resources needed by API server and controllers, destabilizing cluster.

Quick Summary: Control plane nodes run etcd, API server, and controller manager â€” resource-intensive processes with strict requirements. User workloads sharing these nodes can starve control plane components of CPU and memory, causing cluster instability. In managed clusters (EKS, GKE), control plane nodes are hidden; in self-managed, taint control plane nodes with NoSchedule.

Permalink

Q110:

Why does Kubernetes require strict version skew policies?

Senior

Answer

Incompatible versions cause unpredictable behavior; skew limits ensure safe upgrades.

Quick Summary: Kubernetes API and component behavior can change between minor versions. If kubelet is two minor versions behind the API server, some new features or changed behaviors won't work correctly. The skew policy ensures the control plane upgrades first (API server, then controller manager, then scheduler), followed by nodes â€” each step within supported skew.

Permalink

Q111:

How does Persistent Volume expansion work internally?

Senior

Answer

PVC resize updates PV; kubelet expands filesystem with CSI driver support.

Quick Summary: PV expansion requires the storage backend to support it (not all do). The PVC is updated with a larger size request. The CSI driver's controller plugin expands the cloud volume (AWS EBS, GCE PD). Then the node plugin extends the filesystem inside the container. The Pod may need to be restarted for the filesystem expansion to take effect, depending on the driver.

Permalink

Q112:

Why do distributed databases use Pod Anti-Affinity besides StatefulSets?

Senior

Answer

StatefulSets manage identity; anti-affinity spreads replicas to avoid node failure impact.

Quick Summary: StatefulSets guarantee separate PVCs per Pod but don't control which nodes they land on. Two database replicas might still end up on the same physical node â€” risky if that node fails. Pod Anti-Affinity explicitly prevents two replicas from co-locating on the same node or zone, providing actual physical separation beyond just separate storage identities.

Permalink

Q113:

What is the internal difference between static Pods and normal Pods?

Senior

Answer

Static Pods come from node filesystem and are managed only by kubelet; not stored in etcd.

Quick Summary: Static Pods are managed by kubelet directly â€” defined as YAML files in a host directory (/etc/kubernetes/manifests). kubelet creates and monitors them without the API server â€” they survive even if etcd is down. Control plane components (kube-apiserver, etcd, controller-manager, scheduler) are typically static Pods. Normal Pods require the API server and are scheduler-managed.

Permalink

Q114:

Why does Kubernetes use cadvisor for container metrics?

Senior

Answer

cadvisor provides CPU, memory, network, and filesystem stats for autoscaling and monitoring.

Quick Summary: cAdvisor (Container Advisor) is embedded in kubelet and collects container-level resource metrics â€” CPU, memory, network, filesystem per container. It exposes these metrics that the metrics-server scrapes. Kubernetes uses cAdvisor data for HPA scaling decisions, resource limit enforcement visibility, and kubectl top pod output.

Permalink

Q115:

How does VPA differ from HPA?

Senior

Answer

HPA scales replicas; VPA adjusts resource requests and may restart Pods.

Quick Summary: HPA scales horizontally â€” adds or removes Pods based on current resource usage. VPA (Vertical Pod Autoscaler) scales vertically â€” adjusts the CPU and memory requests/limits of individual Pods based on actual usage. VPA is for right-sizing Pod resource requests over time. They address different problems and can conflict â€” typically don't use both for the same Deployment.

Permalink

Q116:

How does kube-scheduler prevent starvation of low-priority Pods?

Senior

Answer

Fair scheduling and backoff ensure low-priority Pods eventually run.

Quick Summary: The scheduler uses priority-based preemption â€” high-priority Pods preempt lower-priority ones to get scheduled. For lower-priority work that can't preempt anything, the scheduler queues it and retries periodically. With proper Priority Classes and a mix of preemptible and non-preemptible workloads, high-priority work always gets scheduled while low-priority work waits rather than starving.

Permalink

Q117:

Why avoid DaemonSet updates during node pressure?

Senior

Answer

DaemonSet updates create churn across all nodes, worsening pressure.

Quick Summary: DaemonSet updates perform a rolling update across nodes. If a node is already under memory or disk pressure, updating its DaemonSet Pod adds more resource pressure â€” potentially triggering evictions of other Pods or causing the node to go NotReady. Pause DaemonSet updates during node pressure incidents; resolve the pressure first.

Permalink

Q118:

Why can CRD conversion webhooks become a bottleneck?

Senior

Answer

Conversion runs on every read/write across versions, overwhelming webhook servers.

Quick Summary: CRD conversion webhooks translate CRD objects between API versions (v1alpha1 â†’ v1beta1 â†’ v1). Every read or write of a CRD object in a different version triggers a webhook call. In clusters with many CRD instances or high API throughput, these synchronous webhook calls add latency to every API operation. Keep webhook response times under 10ms.

Permalink

Q119:

Why does Node Ready status depend on kubelet heartbeats?

Senior

Answer

NodeLease heartbeats from kubelet determine node health; missing heartbeats mark node Unready.

Quick Summary: kubelet sends node heartbeats by updating a Lease object in the kube-node-lease namespace every 10 seconds. The node controller monitors these leases. If a lease isn't updated within the lease duration (default 40s), the node controller marks the node NotReady. In large clusters, using Lease objects (small, cheap) instead of full Node status updates drastically reduces API server load.

Permalink

Q120:

Why is ephemeral storage a common cause of node instability?

Senior

Answer

Logs, layers, and temp files fill disk, triggering evictions.

Quick Summary: Containers write logs (stdout/stderr), and the container runtime stores them as files on the node. Each container restart adds a new log file. Containers that restart frequently (CrashLoopBackOff) create many log files. Large or unbounded log files fill the node's ephemeral storage â€” triggering kubelet's hard eviction and removing Pods from the node.

Permalink

Q121:

How does CNI plugin selection affect control-plane scalability?

Senior

Answer

Different plugins vary in route programming and IP management; poor choice bottlenecks scaling.

Quick Summary: CNI plugins handle networking for every Pod on every node. Some CNI plugins (Flannel with VXLAN, Weave) use userspace components that add CPU overhead per packet. At large scale, the cumulative networking overhead affects control-plane node performance too. eBPF-based CNIs (Cilium) bypass iptables and userspace entirely, scaling better for large clusters.

Permalink

Q122:

Why avoid large numbers of Services with external load balancers?

Senior

Answer

Cloud LBs are costly, slow to provision, and overload API server.

Quick Summary: Each external LoadBalancer Service provisions a cloud load balancer (AWS ELB, GCP LB). Cloud providers charge per load balancer and have limits (typically 200-1000 per account). Large numbers of Services with external LBs can exhaust quotas, incur significant cost, and slow down Service creation as cloud provisioning takes minutes per LB.

Permalink

Q123:

How does Kubernetes prevent two controllers from updating the same object at same time?

Senior

Answer

Controllers use resourceVersion with optimistic concurrency and caching.

Quick Summary: Optimistic concurrency with resourceVersion handles it: both controllers read the object, both see the same resourceVersion, both try to update â€” only the first write succeeds. The second gets a 409 Conflict. The losing controller re-reads the latest version and retries from the new state. No locking needed â€” just retry on conflict.

Permalink

Q124:

What is the difference between eviction and graceful termination at node level?

Senior

Answer

Eviction is due to pressure; graceful termination follows delete requests.

Quick Summary: Graceful termination: kubelet sends SIGTERM to the container, waits terminationGracePeriodSeconds (default 30s), then SIGKILL. The app should drain connections during this window. Node-level eviction (node pressure): kubelet evicts Pods based on priority, starting with BestEffort, then Burstable, then Guaranteed. It's Pod-level termination triggered by kubelet, not by the API server.

Permalink

Q125:

Why is encryption-at-rest critical for Kubernetes Secrets?

Senior

Answer

Secrets stored in etcd are plain base64; without encryption attackers can read credentials.

Quick Summary: Kubernetes Secrets are only base64-encoded in etcd by default â€” not encrypted. Anyone with etcd access reads secrets in plaintext. Enabling encryption-at-rest uses a provider (AES-CBC, AES-GCM, or KMS) to encrypt Secret data before writing to etcd. Essential for compliance (PCI, HIPAA) and defense against etcd backup exposure.

Permalink

Q126:

How does the Kubernetes API Server internally process an incoming request from authentication to admission?

Expert

Answer

Request pipeline: authentication, authorization, mutating admission, validating admission, schema validation, etcd write, and watch notifications.

Quick Summary: The request flows through: Authentication (verify identity â€” bearer token, certificate, OIDC), Authorization (RBAC â€” can this identity do this action on this resource?), Admission Controllers (mutating â€” modify the object; validating â€” reject invalid objects), then object validation against the API schema, and finally persistence to etcd.

Permalink

Q127:

Why does Kubernetes use a watch mechanism instead of continuous polling for state changes?

Expert

Answer

Watch streams push revision-based updates efficiently; polling overloads API server and causes stale reads.

Quick Summary: Polling would require each component to ask the API server "any changes?" every N seconds â€” generating massive load and adding latency proportional to the poll interval. Watches are long-lived HTTP connections where the API server pushes events to clients instantly when objects change. Much lower API server load and near-real-time response to state changes.

Permalink

Q128:

How does the API Server deduplicate events during heavy watch traffic?

Expert

Answer

Updates are coalesced so only the latest revision is delivered, reducing event storms.

Quick Summary: During heavy watch traffic, many rapid changes to the same object generate many events. The API server uses event aggregation and watch bookmarks to reduce noise. Bookmarks are synthetic events that advance the client's resourceVersion without carrying object data â€” helping clients stay current without receiving every intermediate state change.

Permalink

Q129:

Why does etcd use MVCC instead of overwriting values directly?

Expert

Answer

MVCC stores revisions enabling consistent reads, time-travel, and non-blocking watches.

Quick Summary: MVCC (Multi-Version Concurrency Control) keeps old versions of values tagged with their revision number. This enables Watches to replay history from any past revision without locking current values. It also makes atomic compare-and-swap (CAS) operations safe â€” etcd compares the current revision before writing. Overwriting directly would lose the history needed for watches.

Permalink

Q130:

How does the scheduler handle scheduling cycles to prevent race conditions in multi-scheduler setups?

Expert

Answer

Each cycle locks a Pod; schedulers rely on leader election or partitioning to avoid double-binding.

Quick Summary: Each scheduler instance runs a scheduling cycle: claim a Pod to schedule, find a node, then bind it. Without locking, two scheduler instances could both select the same node for the same Pod. Kubernetes uses optimistic binding â€” the scheduler writes the binding to the API server which validates against resourceVersion. Only one binding succeeds; the other is rejected and retried.

Permalink

Q131:

What is preemption toxicity and how does Kubernetes mitigate it?

Expert

Answer

Cascade evictions are avoided by feasibility checks, retry limits, and controlled preemption.

Quick Summary: Preemption toxicity is when preempting lower-priority Pods to make room for a high-priority Pod causes so many Pod disruptions that the cluster destabilizes. Kubernetes mitigates this with PDB (prevents preempting too many replicas), a minimum preemption threshold, and grace periods â€” preemption is tried only after no available node is found.

Permalink

Q132:

How does CRI allow Kubernetes to remain runtime-agnostic?

Expert

Answer

CRI defines gRPC APIs for sandboxing, lifecycle, and images; runtimes implement CRI decoupling kubelet.

Quick Summary: CRI (Container Runtime Interface) is a standard gRPC API between kubelet and the container runtime. Any runtime implementing CRI (containerd, CRI-O) can be used. Kubernetes communicates via CRI calls (RunPodSandbox, CreateContainer, StartContainer) â€” the runtime handles the implementation details. This lets Kubernetes run on any OCI-compatible runtime without code changes.

Permalink

Q133:

Why does kubelet maintain a pod sandbox even when containers crash?

Expert

Answer

Sandbox holds Pod-level namespaces ensuring stable networking and IPs across restarts.

Quick Summary: The pod sandbox is the network namespace and cgroup hierarchy for the Pod â€” created once when the Pod is scheduled. It persists across container restarts. Even if all containers in the Pod crash and restart, the sandbox (Pod IP, network, volumes) stays stable. This is why Pod IP doesn't change when a container restarts inside it.

Permalink

Q134:

How does Kubernetes prevent deadlocks in the node drain process?

Expert

Answer

Drain respects PDBs, uses backoff, ignores DaemonSets/Static Pods, and processes evictions asynchronously.

Quick Summary: kubectl drain cordon the node (stops new scheduling) and evicts Pods. If a Pod has a finalizer that never gets removed (deadlocked controller), it blocks the eviction â€” the drain hangs. Kubernetes uses a configurable eviction timeout; if exceeded, drain fails. The operator must manually intervene â€” remove the finalizer or force-delete the stuck Pod.

Permalink

Q135:

Why are long-lived connections tricky behind Kubernetes Services?

Expert

Answer

Established connections bypass load balancing, causing backend imbalance.

Quick Summary: HTTP/1.1 long-lived connections (WebSocket, gRPC streams, database persistent connections) don't benefit from Service load balancing because one connection goes to one Pod and stays there. All requests on that connection hit the same Pod â€” no load balancing. Use a service mesh (Istio, Linkerd) for request-level load balancing of persistent connections.

Permalink

Q136:

How does Cilium replace kube-proxy using eBPF?

Expert

Answer

eBPF enables direct routing, load balancing, and policy enforcement without iptables.

Quick Summary: Cilium attaches eBPF programs to network interfaces and socket layers. When a Pod connects to a Service ClusterIP, Cilium's eBPF socket-level program intercepts the connect() syscall and redirects it directly to a backend Pod's IP â€” without going through the kernel network stack, without iptables, and without needing kube-proxy at all. L7 policies work similarly.

Permalink

Q137:

What are API Priority and Fairness queues and why do they matter?

Expert

Answer

They allocate fair request shares, ensuring critical traffic is never starved by noisy clients.

Quick Summary: API Priority and Fairness (APF) classifies incoming API requests into priority levels (leader-election, cluster-critical, system, workload-low). Each level has a queue with concurrency limits. This prevents a burst of low-priority requests (like a misconfigured controller) from blocking critical operations (kubelet heartbeats, leader election). Every request gets fair access within its priority class.

Permalink

Q138:

How does Kubernetes ensure strict ordering of writes to a single object under high concurrency?

Expert

Answer

Compare-and-swap with resourceVersion ensures linearizable writes.

Quick Summary: All writes to a single etcd key are serialized by etcd's Raft protocol. The API server includes the resourceVersion in writes â€” etcd rejects writes where the version doesn't match current (optimistic locking). The Raft leader serializes competing writes to the same key â€” only one succeeds per revision. This provides single-object write linearizability.

Permalink

Q139:

Why do multi-cluster architectures require federation or service meshes?

Expert

Answer

Cross-cluster discovery, routing, failover, and policy require abstractions beyond core Kubernetes.

Quick Summary: Each cluster is an isolated control plane â€” Pods in cluster A can't directly discover or communicate with Services in cluster B. Multi-cluster networking (Submariner, Cilium Cluster Mesh) creates cross-cluster service discovery and connectivity. Service meshes (Istio multi-cluster) extend traffic management across clusters. Federation (KubeFed) synchronizes resources across clusters.

Permalink

Q140:

How does kube-controller-manager prevent infinite reconciliation loops?

Expert

Answer

Rate-limited work queues with exponential backoff break infinite retry loops.

Quick Summary: Controllers use the "level-triggered" model â€” they don't remember the sequence of events, just the current desired and actual state. After any reconciliation attempt, they re-read the current state from the API server and compare to desired. If already at desired state, they do nothing. This idempotent design prevents infinite loops â€” reconciliation converges.

Permalink

Q141:

Why do CSI drivers require both node and controller plugins?

Expert

Answer

Controller provisions/attaches; node plugin mounts/unmounts and performs filesystem ops.

Quick Summary: The controller plugin handles cluster-level operations (creating, deleting, snapshotting cloud volumes) â€” it runs as a Pod, often on control plane nodes. The node plugin handles node-level operations (attaching the volume to the node, mounting it into the container) â€” it runs as a DaemonSet on all nodes. Both are required for full persistent volume lifecycle management.

Permalink

Q142:

How does Kubernetes maintain consistency for ConfigMaps and Secrets projected into Pods?

Expert

Answer

Kubelet updates atomic symlink-based volumes in tmpfs when API server changes.

Quick Summary: kubelet watches for ConfigMap and Secret changes via the API server using watches. When a change is detected, kubelet updates the projected files in the Pod's volume (default sync period 60s). Environment variable injections from ConfigMaps/Secrets don't update â€” the Pod must restart. Volume mounts update dynamically; env var injections do not.

Permalink

Q143:

Why is vertical scaling of etcd limited even with powerful hardware?

Expert

Answer

Consensus latency, fsync overhead, and majority replication limit scaling.

Quick Summary: etcd uses Raft â€” all writes go through one leader. Even with more powerful hardware, a single leader serializes all writes â€” adding more CPU or RAM doesn't add more write throughput. etcd's bottleneck is disk fsync latency (every commit requires fsync). Use fast NVMe SSDs, dedicated etcd nodes, and limit etcd object count rather than scaling hardware vertically.

Permalink

Q144:

How does Kubernetes handle stale endpoints when Pods die abruptly?

Expert

Answer

EndpointSlice controller removes endpoints after Pod deletion or NotReady signals.

Quick Summary: When a Pod dies abruptly (OOM kill, node failure), its endpoint isn't removed from the Service endpoint list until the endpoint controller learns of the Pod's deletion via the API server. This propagation takes seconds. During that window, traffic still routes to the dead Pod's IP. kube-proxy retries help; readiness probes and preStop hooks minimize the window.

Permalink

Q145:

How does the scheduler prevent double-binding a Pod to multiple nodes?

Expert

Answer

Posting a Binding object updates Pod status; other schedulers skip already-bound Pods.

Quick Summary: The scheduler uses optimistic binding â€” it chooses a node and then writes a Binding object to the API server, which validates it against current cluster state. Two schedulers could both choose the same node for the same Pod, but only the first binding write succeeds (the second fails due to the Pod's resourceVersion having changed). The losing scheduler must reschedule that Pod.

Permalink

Q146:

Why does IPVS mode scale better under tens of thousands of Services?

Expert

Answer

IPVS performs constant-time routing; iptables rule chains grow linearly.

Quick Summary: iptables stores rules in kernel memory as linked lists â€” lookup is O(n) where n is the number of rules. With 10,000 Services, each packet traverses thousands of rules. IPVS uses kernel hash tables â€” lookup is O(1) regardless of Service count. IPVS also supports more load balancing algorithms (round-robin, least-connection, shortest-expected-delay) natively.

Permalink

Q147:

Why must RBAC roles be tightly scoped in multi-team clusters?

Expert

Answer

Poor scoping allows privilege escalation through rolebinding or secret edits.

Quick Summary: Broad RBAC roles (like cluster-admin) given to developer ServiceAccounts allow any Pod they deploy to read all secrets, modify any Deployment, or delete any resource cluster-wide. A compromised Pod becomes a cluster takeover. Tight scoping (namespace-scoped roles, specific verbs on specific resources) limits the blast radius of any compromised workload.

Permalink

Q148:

How do admission webhooks affect cluster latency?

Expert

Answer

All writes pass through webhooks; slow webhooks stall API server requests.

Quick Summary: Admission webhooks are synchronous HTTP calls â€” the API server must wait for the webhook response before proceeding. A slow or unresponsive webhook blocks all object creation/modification of that type across the cluster. Misconfigured webhook failurePolicy can either block all operations (Fail) or silently skip validation (Ignore). Keep webhooks fast, stateless, and HA-deployed.

Permalink

Q149:

Why is kubelet’s node lease critical for large clusters?

Expert

Answer

NodeLease reduces API load by sending lightweight heartbeats instead of full Node updates.

Quick Summary: In large clusters (1000+ nodes), updating full Node status objects (large JSON) every 10 seconds from every node would flood the API server. Node leases use tiny Lease objects (just a timestamp) for heartbeats â€” 90% less API server load per node. Full Node status updates still happen, but only when the node's status actually changes â€” far less frequently.

Permalink

Q150:

How does Kubernetes avoid excessive DNS traffic due to frequent Pod restarts?

Expert

Answer

CoreDNS caches, EndpointSlices reduce entries, and stable ClusterIP reduces re-resolution.

Quick Summary: Every DNS query from every Pod hits CoreDNS. In clusters with frequent Pod restarts, new Pods look up Service names and ExternalName records. Under heavy DNS load, CoreDNS becomes a bottleneck. NodeLocal DNSCache runs a DNS cache DaemonSet on each node â€” Pods hit the local cache first, dramatically reducing CoreDNS load and DNS query latency.

Permalink

Q151:

Why does Pod Disruption Budget not protect against node failures?

Expert

Answer

PDB applies only to voluntary disruptions; node crashes bypass it.

Quick Summary: PDB is enforced only during voluntary disruptions processed by the Eviction API â€” drain operations, cluster upgrades. Node failures are involuntary â€” the node goes down hard, and Kubernetes terminates all its Pods immediately without consulting PDB. Use Pod Anti-Affinity across nodes/zones to ensure replicas are physically separated â€” that's your protection against node failures.

Permalink

Q152:

Why can aggressive API Server audit logging degrade performance?

Expert

Answer

Audit logs are synchronous; heavy logging slows API request handling.

Quick Summary: Every API request is logged with its user, action, resource, timestamp, and response â€” including the full request/response body if configured. At high request rates (busy clusters), audit logging writes megabytes per second to disk, synchronously, blocking API server request processing. Use structured logging, filter out high-volume read requests, and write to a fast storage backend.

Permalink

Q153:

How does pod-level Seccomp differ from AppArmor or SELinux?

Expert

Answer

Seccomp filters syscalls; AppArmor/SELinux enforce filesystem and process restrictions.

Quick Summary: Seccomp filters system calls at the kernel level â€” it blocks specific syscalls entirely (like ptrace, keyctl) before they reach the kernel. AppArmor restricts what filesystem paths and network operations a process can perform. SELinux uses mandatory type enforcement on all resources. They're complementary â€” seccomp limits syscalls, AppArmor/SELinux limit resource access.

Permalink

Q154:

Why do Node Local DNS caches drastically improve performance in large clusters?

Expert

Answer

Local caches reduce CoreDNS load and latency for frequent lookups.

Quick Summary: NodeLocal DNSCache deploys a CoreDNS instance as a DaemonSet on each node, listening on a link-local IP. Pod DNS resolvers are configured to query this local cache first. Cache hits are answered in microseconds without leaving the node. Only cache misses (first lookup of a new name) go to the cluster CoreDNS. Reduces DNS latency from milliseconds to microseconds for cached lookups.

Permalink

Q155:

What architectural principles make Kubernetes eventually consistent?

Expert

Answer

Components use cached informers and async reconciliation; etcd is strongly consistent but cluster converges eventually.

Quick Summary: Kubernetes is eventually consistent because: controllers reconcile asynchronously (there's always a lag between desired state change and actual state), etcd watch propagation has latency, kube-proxy endpoint updates have a delay, and DNS TTL means service discovery lags. The system converges to the desired state, but there's always a window where actual and desired state differ.

Permalink

Curated Sets for Kubernetes

No curated sets yet. Group questions into collections from the admin panel to feature them here.

Explore Related Interview Topics

JavaScript Interview Questions Java Interview Questions Python Interview Questions JQuery Interview Questions Docker Interview Questions React Interview Questions

Ready to level up? Start Practice