Senior-Level Kubernetes Interview Questions

Q1:

How does the Kubernetes API Server enforce optimistic concurrency using resourceVersion?

Senior

Answer

Every object has a resourceVersion. Outdated writes are rejected with 409 Conflict, enforcing optimistic locking.

Quick Summary: Every Kubernetes object has a resourceVersion â€” a string that changes on every write. When a controller reads an object and wants to update it, it must include the same resourceVersion. If another writer already changed it, the resourceVersion won't match and the API server returns 409 Conflict. The controller must re-read and retry. This prevents lost updates without distributed locks.

Permalink

Q2:

Why does etcd store all Kubernetes objects as flat key-value pairs instead of hierarchical tree structures?

Senior

Answer

Flat key-space enables fast prefix scans, simple watches, and predictable performance.

Quick Summary: Raft requires that all state machine updates go through a leader and be replicated to a majority. A hierarchical tree structure with parent-child relationships would require multi-key transactions for atomic updates â€” complex and slow in Raft. A flat key-value model means each object is one key â€” single-key reads and writes are atomic and simple to replicate.

Permalink

Q3:

How do Kubernetes Watches avoid missing updates during network interruptions?

Senior

Answer

Clients reconnect using last resourceVersion; API sends all missed events.

Quick Summary: Kubernetes Watches use etcd's watch mechanism with a bookmark system. After reconnection, clients send their last received resourceVersion â€” etcd replays all events since that version. If the version is too old (compacted away), the API server returns a "too old resource version" error, triggering a full re-list. Etcd compaction is configured with a retention window to balance history vs storage.

Permalink

Q4:

Why can excessive CRDs degrade API Server performance?

Senior

Answer

Each CRD adds endpoints, conversion, validation, and storage overhead on API server and etcd.

Quick Summary: Each CRD registers new API endpoints and type schemas with the API server. The API server must maintain OpenAPI schemas for all CRDs, validate all objects against them, and serve discovery documents listing all available types. Hundreds of CRDs from many operators add significant memory overhead to the API server and slow down discovery endpoint responses.

Permalink

Q5:

How does the scheduler use informer caches instead of querying the API Server directly?

Senior

Answer

Scheduler listens to watch events and maintains local cache for fast decision-making.

Quick Summary: The scheduler uses local informer caches (SharedInformer) that maintain a local copy of all nodes and Pods, updated via watches from the API server. Filtering and scoring run against this in-memory cache â€” no API calls per scheduling cycle. This makes scheduling fast and prevents scheduler load from overwhelming the API server in large clusters.

Permalink

Q6:

Why is kube-apiserver stateless even though it controls critical operations?

Senior

Answer

All state lives in etcd; API servers scale horizontally without syncing state.

Quick Summary: kube-apiserver stores no persistent state itself â€” all state is in etcd. This makes API server stateless and horizontally scalable. Run 3 or more API server replicas behind a load balancer and the cluster handles API server failures transparently. Losing one API server instance doesn't lose any data or interrupt running workloads.

Permalink

Q7:

How does etcd use Raft to guarantee strong consistency across master nodes?

Senior

Answer

Raft elects a leader; writes go to leader, replicated to followers, committed on majority ACK.

Quick Summary: Raft elects a leader â€” all writes go through the leader. The leader replicates the entry to followers. Once a majority acknowledges, the leader commits. Reads can also be served from the leader (linearizable reads) or with lease-based reads from followers. If the leader dies, a new election ensures only one leader exists at a time â€” no split-brain.

Permalink

Q8:

What scenarios cause Kubernetes components to enter thundering herd behavior?

Senior

Answer

Many controllers react to same event simultaneously, flooding API server.

Quick Summary: Thundering herd happens when many components simultaneously reconnect to the API server after a network partition or API server restart â€” all sending list requests at once. This floods the API server with expensive list operations. Kubernetes uses exponential backoff with jitter in client reconnection logic to spread reconnections over time.

Permalink

Q9:

Why do CNI plugins often require a dedicated MTU configuration?

Senior

Answer

Encapsulation reduces payload size; wrong MTU causes fragmentation or packet drops.

Quick Summary: VXLAN encapsulation adds overhead bytes to each packet. If the CNI plugin doesn't set the MTU correctly (accounting for the VXLAN header), packets get fragmented â€” split into multiple smaller packets. Fragmentation significantly reduces network throughput and increases latency. Each CNI plugin must be configured with the correct MTU for the underlying network.

Permalink

Q10:

What is the difference between kube-proxy iptables and IPVS modes under high traffic?

Senior

Answer

iptables does linear rule matching; IPVS uses kernel hash tables for large-scale performance.

Quick Summary: iptables mode programs a chain of rules per Service. With thousands of Services, each packet traverses potentially thousands of iptables rules sequentially â€” O(n) lookup. IPVS uses kernel-level hash tables for Service lookup â€” O(1) regardless of Service count. At 10,000+ Services, iptables mode causes measurable packet processing latency; IPVS does not.

Permalink

Q11:

Why can misconfigured readiness probes cripple an entire microservice chain?

Senior

Answer

Pods oscillate Ready/NotReady causing upstream retries and cascading failures.

Quick Summary: If a readiness probe is too strict (fails during startup, uses wrong endpoint, wrong port), Pods never become ready â€” traffic never reaches them. The Service endpoint list empties. Worse, if it flaps intermittently, the Service constantly adds and removes endpoints, causing connection errors. Always test readiness probes in isolation before deploying.

Permalink

Q12:

What is the purpose of the garbage collector controller in Kubernetes?

Senior

Answer

It cleans dependents using ownerReferences when parent objects are deleted.

Quick Summary: The garbage collector controller watches all Kubernetes objects and deletes orphaned ones. When a Deployment is deleted, the garbage collector also deletes its ReplicaSets and then their Pods, using owner references. Without it, deleting a Deployment would leave behind orphaned ReplicaSets and Pods consuming resources indefinitely.

Permalink

Q13:

How do finalizers prevent premature resource deletion?

Senior

Answer

Finalizers block deletion until cleanup is complete; controllers remove finalizers afterward.

Quick Summary: Finalizers are strings added to an object's metadata.finalizers list. A delete request sets deletionTimestamp but doesn't actually delete the object â€” the object lingers until all finalizers are removed. Controllers (often operators) watch for this state, perform cleanup tasks (like deleting external cloud resources), then remove their finalizer, allowing the object to be fully deleted.

Permalink

Q14:

Why should autoscaler and HPA cooldown windows be tuned together?

Senior

Answer

Mismatched timings cause scaling oscillations and thrashing.

Quick Summary: HPA scales up immediately when metrics cross a threshold, but takes time to scale back down (default stabilization window). If the Cluster Autoscaler removes nodes too quickly after scale-down, then HPA scales up again before new nodes are ready â€” causing scheduling failures. Tune HPA scale-down stabilization (default 5 minutes) and Cluster Autoscaler cooldown to match your app's traffic patterns.

Permalink

Q15:

Why is API aggregation essential for large enterprises?

Senior

Answer

It enables custom APIs behind API server without modifying Kubernetes core.

Quick Summary: API aggregation lets custom API servers (like metrics-server, service-catalog, or CRD-based operators) register as extensions of the main Kubernetes API server. Clients see them as part of the standard API â€” no separate endpoint to target. Enterprises use this to extend Kubernetes with domain-specific APIs without forking the core API server.

Permalink

Q16:

How does kubelet perform node-level Pod lifecycle management?

Senior

Answer

Kubelet ensures Pod state, manages cgroups, probes, logs, volumes, and interacts with runtime.

Quick Summary: kubelet watches Pod assignments from the API server, calls the container runtime (via CRI) to create containers, monitors container health via probes, manages volume mounts and secrets injection, reports Pod status back to the API server, and evicts Pods when the node is under resource pressure. It's the full Pod lifecycle manager at the node level.

Permalink

Q17:

Why does kubelet sometimes refuse to start new Pods even when resources appear available?

Senior

Answer

Kubelet reserves system resources (OS, eviction thresholds, kernel memory).

Quick Summary: Kubelet enforces node-level resource reservations (kube-reserved, system-reserved). Even if the node has free allocatable capacity, if a Pod's requests exceed the node's allocatable resources, kubelet rejects it. Also, if the node has reached its max-pods limit, or if the image can't be pulled, or if required volumes aren't available, kubelet refuses the Pod.

Permalink

Q18:

How do StatefulSets maintain the order of Pod creation and termination?

Senior

Answer

They enforce strict ordinal rules; Pod N+1 waits for Pod N to become Ready.

Quick Summary: StatefulSets create Pods in order (app-0, app-1, ...) and each must be Running and Ready before the next is created. They terminate Pods in reverse order (app-2, app-1, app-0). This ordered lifecycle is critical for distributed systems that elect leaders or require peers to join one at a time. The Pod Management Policy can be set to Parallel for workloads that don't need ordering.

Permalink

Q19:

Why is multi-zone cluster topology crucial for HA?

Senior

Answer

Distribution across zones prevents outages from zone-wide failures.

Quick Summary: Multi-zone topology spreads Pods across availability zones â€” a cloud zone failure (power outage, network partition) only takes down Pods in one zone. With Pods in 3 zones, you survive any single zone failure with 2/3 capacity. Without zone spreading, all Pods could land in one zone â€” a single failure brings down everything.

Permalink

Q20:

Why do some workloads require pod-level anti-affinity instead of PDB?

Senior

Answer

Anti-affinity prevents co-location to reduce correlated failures; PDB does not control placement.

Quick Summary: PDB protects against voluntary disruptions (drain, eviction). During node failures (involuntary), PDB doesn't apply â€” Kubernetes terminates the node's Pods regardless. Pod anti-affinity ensures replicas land on different nodes, so a node failure only kills one replica. Anti-affinity is the HA strategy; PDB is the controlled-maintenance strategy.

Permalink

Q21:

What causes slow pod scheduling when thousands of Pods are deployed?

Senior

Answer

Large node count, expensive scoring, cache churn, and many unschedulable Pods cause delays.

Quick Summary: Scheduling each Pod requires the scheduler to filter and score all eligible nodes. With thousands of Pods, the node list is long and the work per scheduling cycle is high. Without percentage-of-nodes-to-score optimization, every Pod evaluates every node. Large clusters should tune percentageOfNodesToScore to sample instead of evaluating all nodes for non-critical scheduling decisions.

Permalink

Q22:

Why is etcd compaction required?

Senior

Answer

Old revisions slow performance; compaction removes stale history.

Quick Summary: etcd stores all historical revisions of objects using MVCC. Over time, millions of old revisions accumulate, growing the etcd database size indefinitely. Compaction discards old revisions, keeping only the most recent. Without regular compaction, etcd db size grows until it hits the size quota (default 8GB), after which it refuses all writes â€” cluster-wide outage.

Permalink

Q23:

How does Kubernetes guarantee ordering of updates for the same resource?

Senior

Answer

etcd linearizable reads + monotonically increasing resourceVersion ensure ordered updates.

Quick Summary: The API server processes updates to the same object sequentially because etcd is the serialization point â€” only one write can succeed per resourceVersion. All updates go through etcd's Raft leader, which applies them in order. Watch events are delivered in revision order to all watchers, ensuring controllers always see a consistent, ordered history of changes.

Permalink

Q24:

Why should operators avoid running user workloads on control-plane nodes?

Senior

Answer

User Pods consume resources needed by API server and controllers, destabilizing cluster.

Quick Summary: Control plane nodes run etcd, API server, and controller manager â€” resource-intensive processes with strict requirements. User workloads sharing these nodes can starve control plane components of CPU and memory, causing cluster instability. In managed clusters (EKS, GKE), control plane nodes are hidden; in self-managed, taint control plane nodes with NoSchedule.

Permalink

Q25:

Why does Kubernetes require strict version skew policies?

Senior

Answer

Incompatible versions cause unpredictable behavior; skew limits ensure safe upgrades.

Quick Summary: Kubernetes API and component behavior can change between minor versions. If kubelet is two minor versions behind the API server, some new features or changed behaviors won't work correctly. The skew policy ensures the control plane upgrades first (API server, then controller manager, then scheduler), followed by nodes â€” each step within supported skew.

Permalink

Q26:

How does Persistent Volume expansion work internally?

Senior

Answer

PVC resize updates PV; kubelet expands filesystem with CSI driver support.

Quick Summary: PV expansion requires the storage backend to support it (not all do). The PVC is updated with a larger size request. The CSI driver's controller plugin expands the cloud volume (AWS EBS, GCE PD). Then the node plugin extends the filesystem inside the container. The Pod may need to be restarted for the filesystem expansion to take effect, depending on the driver.

Permalink

Q27:

Why do distributed databases use Pod Anti-Affinity besides StatefulSets?

Senior

Answer

StatefulSets manage identity; anti-affinity spreads replicas to avoid node failure impact.

Quick Summary: StatefulSets guarantee separate PVCs per Pod but don't control which nodes they land on. Two database replicas might still end up on the same physical node â€” risky if that node fails. Pod Anti-Affinity explicitly prevents two replicas from co-locating on the same node or zone, providing actual physical separation beyond just separate storage identities.

Permalink

Q28:

What is the internal difference between static Pods and normal Pods?

Senior

Answer

Static Pods come from node filesystem and are managed only by kubelet; not stored in etcd.

Quick Summary: Static Pods are managed by kubelet directly â€” defined as YAML files in a host directory (/etc/kubernetes/manifests). kubelet creates and monitors them without the API server â€” they survive even if etcd is down. Control plane components (kube-apiserver, etcd, controller-manager, scheduler) are typically static Pods. Normal Pods require the API server and are scheduler-managed.

Permalink

Q29:

Why does Kubernetes use cadvisor for container metrics?

Senior

Answer

cadvisor provides CPU, memory, network, and filesystem stats for autoscaling and monitoring.

Quick Summary: cAdvisor (Container Advisor) is embedded in kubelet and collects container-level resource metrics â€” CPU, memory, network, filesystem per container. It exposes these metrics that the metrics-server scrapes. Kubernetes uses cAdvisor data for HPA scaling decisions, resource limit enforcement visibility, and kubectl top pod output.

Permalink

Q30:

How does VPA differ from HPA?

Senior

Answer

HPA scales replicas; VPA adjusts resource requests and may restart Pods.

Quick Summary: HPA scales horizontally â€” adds or removes Pods based on current resource usage. VPA (Vertical Pod Autoscaler) scales vertically â€” adjusts the CPU and memory requests/limits of individual Pods based on actual usage. VPA is for right-sizing Pod resource requests over time. They address different problems and can conflict â€” typically don't use both for the same Deployment.

Permalink

Q31:

How does kube-scheduler prevent starvation of low-priority Pods?

Senior

Answer

Fair scheduling and backoff ensure low-priority Pods eventually run.

Quick Summary: The scheduler uses priority-based preemption â€” high-priority Pods preempt lower-priority ones to get scheduled. For lower-priority work that can't preempt anything, the scheduler queues it and retries periodically. With proper Priority Classes and a mix of preemptible and non-preemptible workloads, high-priority work always gets scheduled while low-priority work waits rather than starving.

Permalink

Q32:

Why avoid DaemonSet updates during node pressure?

Senior

Answer

DaemonSet updates create churn across all nodes, worsening pressure.

Quick Summary: DaemonSet updates perform a rolling update across nodes. If a node is already under memory or disk pressure, updating its DaemonSet Pod adds more resource pressure â€” potentially triggering evictions of other Pods or causing the node to go NotReady. Pause DaemonSet updates during node pressure incidents; resolve the pressure first.

Permalink

Q33:

Why can CRD conversion webhooks become a bottleneck?

Senior

Answer

Conversion runs on every read/write across versions, overwhelming webhook servers.

Quick Summary: CRD conversion webhooks translate CRD objects between API versions (v1alpha1 â†’ v1beta1 â†’ v1). Every read or write of a CRD object in a different version triggers a webhook call. In clusters with many CRD instances or high API throughput, these synchronous webhook calls add latency to every API operation. Keep webhook response times under 10ms.

Permalink

Q34:

Why does Node Ready status depend on kubelet heartbeats?

Senior

Answer

NodeLease heartbeats from kubelet determine node health; missing heartbeats mark node Unready.

Quick Summary: kubelet sends node heartbeats by updating a Lease object in the kube-node-lease namespace every 10 seconds. The node controller monitors these leases. If a lease isn't updated within the lease duration (default 40s), the node controller marks the node NotReady. In large clusters, using Lease objects (small, cheap) instead of full Node status updates drastically reduces API server load.

Permalink

Q35:

Why is ephemeral storage a common cause of node instability?

Senior

Answer

Logs, layers, and temp files fill disk, triggering evictions.

Quick Summary: Containers write logs (stdout/stderr), and the container runtime stores them as files on the node. Each container restart adds a new log file. Containers that restart frequently (CrashLoopBackOff) create many log files. Large or unbounded log files fill the node's ephemeral storage â€” triggering kubelet's hard eviction and removing Pods from the node.

Permalink

Q36:

How does CNI plugin selection affect control-plane scalability?

Senior

Answer

Different plugins vary in route programming and IP management; poor choice bottlenecks scaling.

Quick Summary: CNI plugins handle networking for every Pod on every node. Some CNI plugins (Flannel with VXLAN, Weave) use userspace components that add CPU overhead per packet. At large scale, the cumulative networking overhead affects control-plane node performance too. eBPF-based CNIs (Cilium) bypass iptables and userspace entirely, scaling better for large clusters.

Permalink

Q37:

Why avoid large numbers of Services with external load balancers?

Senior

Answer

Cloud LBs are costly, slow to provision, and overload API server.

Quick Summary: Each external LoadBalancer Service provisions a cloud load balancer (AWS ELB, GCP LB). Cloud providers charge per load balancer and have limits (typically 200-1000 per account). Large numbers of Services with external LBs can exhaust quotas, incur significant cost, and slow down Service creation as cloud provisioning takes minutes per LB.

Permalink

Q38:

How does Kubernetes prevent two controllers from updating the same object at same time?

Senior

Answer

Controllers use resourceVersion with optimistic concurrency and caching.

Quick Summary: Optimistic concurrency with resourceVersion handles it: both controllers read the object, both see the same resourceVersion, both try to update â€” only the first write succeeds. The second gets a 409 Conflict. The losing controller re-reads the latest version and retries from the new state. No locking needed â€” just retry on conflict.

Permalink

Q39:

What is the difference between eviction and graceful termination at node level?

Senior

Answer

Eviction is due to pressure; graceful termination follows delete requests.

Quick Summary: Graceful termination: kubelet sends SIGTERM to the container, waits terminationGracePeriodSeconds (default 30s), then SIGKILL. The app should drain connections during this window. Node-level eviction (node pressure): kubelet evicts Pods based on priority, starting with BestEffort, then Burstable, then Guaranteed. It's Pod-level termination triggered by kubelet, not by the API server.

Permalink

Q40:

Why is encryption-at-rest critical for Kubernetes Secrets?

Senior

Answer

Secrets stored in etcd are plain base64; without encryption attackers can read credentials.

Quick Summary: Kubernetes Secrets are only base64-encoded in etcd by default â€” not encrypted. Anyone with etcd access reads secrets in plaintext. Enabling encryption-at-rest uses a provider (AES-CBC, AES-GCM, or KMS) to encrypt Secret data before writing to etcd. Essential for compliance (PCI, HIPAA) and defense against etcd backup exposure.

Permalink

Get Pro for Free

Senior Kubernetes Interview Questions

Kubernetes Interview Questions & Answers

What you will learn from these Kubernetes interview questions:

Questions

How does the Kubernetes API Server enforce optimistic concurrency using resourceVersion?

Answer

Why does etcd store all Kubernetes objects as flat key-value pairs instead of hierarchical tree structures?

Answer

How do Kubernetes Watches avoid missing updates during network interruptions?

Answer

Why can excessive CRDs degrade API Server performance?

Answer

How does the scheduler use informer caches instead of querying the API Server directly?

Answer

Why is kube-apiserver stateless even though it controls critical operations?

Answer

How does etcd use Raft to guarantee strong consistency across master nodes?

Answer

What scenarios cause Kubernetes components to enter thundering herd behavior?

Answer

Why do CNI plugins often require a dedicated MTU configuration?

Answer

What is the difference between kube-proxy iptables and IPVS modes under high traffic?

Answer

Why can misconfigured readiness probes cripple an entire microservice chain?

Answer

What is the purpose of the garbage collector controller in Kubernetes?

Answer

How do finalizers prevent premature resource deletion?

Answer

Why should autoscaler and HPA cooldown windows be tuned together?

Answer

Why is API aggregation essential for large enterprises?

Answer

How does kubelet perform node-level Pod lifecycle management?

Answer

Why does kubelet sometimes refuse to start new Pods even when resources appear available?

Answer

How do StatefulSets maintain the order of Pod creation and termination?

Answer

Why is multi-zone cluster topology crucial for HA?

Answer

Why do some workloads require pod-level anti-affinity instead of PDB?

Answer

What causes slow pod scheduling when thousands of Pods are deployed?

Answer

Why is etcd compaction required?

Answer

How does Kubernetes guarantee ordering of updates for the same resource?

Answer

Why should operators avoid running user workloads on control-plane nodes?

Answer

Why does Kubernetes require strict version skew policies?

Answer

How does Persistent Volume expansion work internally?

Answer

Why do distributed databases use Pod Anti-Affinity besides StatefulSets?

Answer

What is the internal difference between static Pods and normal Pods?

Answer

Why does Kubernetes use cadvisor for container metrics?

Answer

How does VPA differ from HPA?

Answer

How does kube-scheduler prevent starvation of low-priority Pods?

Answer

Why avoid DaemonSet updates during node pressure?

Answer

Why can CRD conversion webhooks become a bottleneck?

Answer

Why does Node Ready status depend on kubelet heartbeats?

Answer

Why is ephemeral storage a common cause of node instability?

Answer

How does CNI plugin selection affect control-plane scalability?

Answer

Why avoid large numbers of Services with external load balancers?

Answer

How does Kubernetes prevent two controllers from updating the same object at same time?