Kubernetes Interview Questions for 2026: 40+ Q's Across Architecture, Networking, Troubleshooting, and the Scenarios Interviewers Throw at New Grads
Kubernetes interview questions in 2026 split between book-knowledge (what's a Pod, a Deployment, a Service) and incident-grade scenarios (your pod is CrashLoopBackOff'ing, what do you check first). The book questions you can grind. The scenarios separate the candidate who watched a YouTube series from the engineer who's actually shipped to a cluster. This guide covers 40+ questions across architecture, networking, troubleshooting, security, and the kubectl commands every new grad fumbles, plus a 3-week prep plan that doesn't ask you to lie about production experience you don't have.
By Alex Chen, Founder, InterviewChamp.AI · Last updated
27 min readWhat kubernetes interview questions actually test in 2026
Kubernetes interview questions in 2026 test two things. Whether you understand the architecture at a vocabulary level (what's a Pod, what's a Deployment, what's a Service), and whether you've ever actually used a cluster (your pod is CrashLoopBackOff'ing, what's the first command you run). The first half you grind from the docs. The second half is where most CS new grads bomb.
Here's how it usually goes. A new grad takes a Docker class, watches a YouTube series, reads the official tutorial, puts "Kubernetes" on the resume. Then a DevOps interviewer asks what they'd do if a pod was stuck in Pending state. 30 seconds of silence. "I'd, uh, check the logs?" The docs teach you the nouns. The interview tests the verbs.
In the 2025-2026 hiring cycle the distribution of kubernetes-related interview questions for new-grad-DevOps, SRE-junior, Platform-Engineering-junior, and SWE-with-K8s-tag roles looks like this, based on r/devops and r/cscareerquestions threads through Q1 2026:
- 35% architecture questions (Pod, Deployment, Service, ConfigMap, Secret, namespaces)
- 25% troubleshooting scenarios (your pod is X, walk me through it)
- 15% networking questions (CNI, Services, NetworkPolicy, DNS)
- 10% security and RBAC questions
- 10% kubectl command fluency
- 5% adjacent topics (Helm, Docker, CI/CD integration, operators)
The 25% troubleshooting slice is the one that disproportionately decides the outcome. You can stumble on a NetworkPolicy question and still pass. You cannot freeze on "your pod is in CrashLoopBackOff" and pass.
The 5 buckets of kubernetes interview questions
This guide is organized into five buckets that mirror what shows up in real interviews. Architecture (the nouns), networking (the wires), troubleshooting (the verbs), security and RBAC (the rules), kubectl (the muscle memory). Plus a scenario bucket on top, because that's the format most likely to decide your round.
Five roles ask these questions at the new-grad level: DevOps engineer (heaviest on full surface area), SRE-junior (heavier on troubleshooting and observability), platform-engineer-junior (heaviest on architecture and YAML), SWE-with-K8s-tag (lightest depth, deployment-focused), and cloud-engineer-junior (mid-depth, EKS/GKE/AKS-flavored). If you're applying broadly, prep for DevOps depth. That covers the other four.
Skim the questions you already know. Drill the ones you don't. Each carries a sample-answer outline, not a full canned response. Adapt the language to your own voice.
Kubernetes architecture interview questions (10 Q)
The architecture bucket is the foundation. Every other bucket assumes you have this vocabulary down. If you can't define a Pod, a Deployment, a Service, and a ConfigMap in one sentence each, start here.
Q1. What is a Pod?
The smallest deployable unit in kubernetes. One or more containers sharing network (same IP, same ports) and storage (shared volumes), running together on a single node. The most common case is one container per Pod. The multi-container pattern shows up for sidecars (logging agent, service mesh proxy, config reloader) running alongside the main container.
Q2. What is a Deployment?
A higher-level controller that manages a set of identical Pods through a ReplicaSet. Handles three things: maintaining the desired replica count, rolling updates when the Pod template changes, and rollbacks when something goes wrong. You almost never create raw Pods in production. You create a Deployment, the Deployment creates a ReplicaSet, the ReplicaSet creates Pods. The chain matters because kubectl rollout undo operates on the Deployment, not the Pods.
Q3. What's a ReplicaSet and why does it exist?
A ReplicaSet maintains a stable set of replica Pods running at any given time. Deployments use ReplicaSets under the hood to handle the rollout: during a rolling update, the Deployment creates a new ReplicaSet, scales it up, and scales the old one down. You rarely interact with ReplicaSets directly. The interview signal is just knowing the chain (Deployment to ReplicaSet to Pod) so you understand what kubectl get rs is showing you.
Q4. What's a StatefulSet and when would you use one?
A StatefulSet is the workload type for applications that need stable network identities, ordered startup and shutdown, and persistent per-pod storage. Pods get predictable names (web-0, web-1, web-2), and each pod gets its own PersistentVolumeClaim that survives rescheduling. Use it for databases (Postgres, MongoDB), distributed systems with leader election (Kafka, etcd, Cassandra), and anything where pod-1 must always be pod-1 with the same hostname and the same disk. Use a Deployment for stateless workloads. The criterion is stable identity plus persistent storage, not just "has state."
Q5. What's a DaemonSet and when would you use one?
A DaemonSet ensures one Pod runs on every node (or every node matching a selector). Use case: per-node infrastructure like log collectors, metrics agents, network plugins, storage drivers. New nodes get a pod automatically; leaving nodes lose theirs.
Q6. What's a Job and a CronJob?
A Job runs a Pod to completion once (or N times in parallel). Useful for migrations, backups, one-shot data processing. A CronJob is a scheduled Job, the cluster equivalent of cron. Spec includes a schedule (standard cron syntax) and a Job template. The new-grad trap: forgetting concurrencyPolicy and history limits, resulting in zombie jobs piling up over months.
Q7. What's a Service and what types are there?
A Service is a stable virtual IP and DNS name that fronts a set of dynamic Pod IPs. The Pods come and go (rescheduling, scaling), the Service IP stays stable. Three main types. ClusterIP (default, internal-only, used for service-to-service in the cluster). NodePort (exposes the service on a static port on every node IP, useful for development or bare-metal). LoadBalancer (provisions an external cloud load balancer, the production pattern for exposing HTTP to the internet). There's also ExternalName, which is a CNAME alias to an external DNS name, used for routing to services outside the cluster.
Q8. What's an Ingress and how is it different from a LoadBalancer Service?
An Ingress is a higher-level routing rule that lets you route many hostnames or paths through a single entry point. A LoadBalancer service provisions one cloud load balancer per service. With 50 services, that's 50 LoadBalancers, which is expensive. With Ingress, you run one ingress controller (nginx, traefik, haproxy) behind a single LoadBalancer, and the controller routes by hostname or path to the right backend Service. The Ingress resource describes the routing rules. The Ingress controller implements them. Production clusters almost always use Ingress for HTTP traffic.
Q9. What's a ConfigMap and how is it different from a Secret?
Both are key-value stores that pods can mount as environment variables or files. ConfigMaps hold non-sensitive configuration: feature flags, URLs, log levels, application configs. Secrets hold sensitive data: API keys, database passwords, TLS certificates. The honest version of this question: Secrets are base64-encoded by default, which is encoding, not encryption. Anyone with cluster read access can decode a Secret. Production clusters enable encryption-at-rest for Secrets in etcd, and most layer in an external secret manager (sealed-secrets, Vault, cloud KMS) for the actual sensitive material. The base64 thing trips up new grads who think Secrets are inherently secure.
Q10. What's a namespace and when would you use multiple?
A namespace is a logical partition inside a cluster, scoping names, resource quotas, RBAC rules, and network policies. You can have a Service called "web" in both dev and prod namespaces. Use namespaces to separate environments, teams, or applications. The new-grad mistake: putting everything in the default namespace, which makes cleanup, RBAC, and resource limits painful later.
Kubernetes networking interview questions (8 Q)
Networking is where most candidates get stuck. The abstractions stack deep (CNI, Services, Endpoints, kube-proxy, DNS, Ingress), and you can't see most of it without kubectl get endpoints and a careful read of describe output. Drill these eight.
Q11. How does pod-to-pod networking work in kubernetes?
Three rules. Every pod gets its own IP from a flat cluster network, no NAT. Pods on the same node and pods on different nodes communicate the same way. The Service abstraction provides stable virtual IPs in front of dynamic pod IPs. The implementation lives in the CNI plugin (Calico, Cilium, Flannel), installed at cluster bootstrap.
Q12. What's a CNI plugin?
Container Network Interface. The plugin model kubernetes uses to delegate pod networking to a third-party implementation. The CNI plugin assigns pod IPs, configures routes between nodes, and enforces NetworkPolicy if supported. Popular CNIs in 2026: Calico (broad use, strong NetworkPolicy support), Cilium (eBPF-based, fastest growing in 2025), Flannel (simple, kubeadm default). The interview value: CNI is a separate concern from kubernetes core, and the choice affects NetworkPolicy and observability.
Q13. How does a Service route traffic to Pods?
The Service has a selector (label matching) that picks Pods. The Service controller maintains an Endpoints (or EndpointSlice) object listing the matching Pod IPs and ports. kube-proxy runs on every node and watches Endpoints, programming iptables (or IPVS, on newer clusters) rules that route Service-IP-and-port to a randomly-selected Pod IP. When a Pod dies or is rescheduled, the Endpoints object updates, kube-proxy reprograms the rules, traffic shifts. The new-grad mistake is thinking Services are a load balancer in the cloud-LB sense. They're an iptables (or IPVS) routing layer on every node.
Q14. What is kube-proxy?
A network proxy running on every node, programming iptables or IPVS rules to implement the Service abstraction. It watches the kubernetes API for Service and Endpoints changes and updates routing accordingly. Usually a DaemonSet under kube-system.
Q15. How does DNS work in kubernetes?
Every Service gets a DNS record automatically: service-name.namespace.svc.cluster.local. Pods get a resolv.conf pointing at the cluster DNS service (CoreDNS in modern clusters). When a pod looks up "web", the resolv.conf search path expands it to "web.current-namespace.svc.cluster.local". DNS failures (CoreDNS pod down, NetworkPolicy blocking DNS) are one of the most common cluster-wide issues in production, which is why this question shows up so often.
Q16. What's a NetworkPolicy?
A kubernetes resource that restricts which pods can talk to which other pods at the network layer. Default kubernetes networking is permissive: all pods can reach all other pods. A NetworkPolicy lets you write rules like 'pods in the frontend namespace can reach pods in the backend namespace on port 8080, nothing else.' NetworkPolicies require a CNI plugin that supports them (Calico, Cilium yes; Flannel out-of-the-box no). The interview signal is knowing NetworkPolicy is opt-in and CNI-dependent, not a built-in like Services.
Q17. What's the difference between Ingress and a LoadBalancer Service?
A LoadBalancer Service provisions one cloud load balancer per service. Expensive at scale. Ingress runs one ingress controller behind a single LoadBalancer, and the controller routes by hostname or path. With 50 microservices, Ingress costs one LoadBalancer. LoadBalancer Services cost 50. Production clusters use Ingress for HTTP traffic and LoadBalancer Services only for non-HTTP protocols (gRPC sometimes, raw TCP for databases, UDP for game servers).
Q18. How would you debug a Service that's not routing traffic?
A four-step workflow. First, kubectl get endpoints (service-name) and check whether the Endpoints list is empty. If empty, the Service selector doesn't match any Pods, fix the selector or the Pod labels. Second, kubectl describe pod on one of the backend Pods, confirm the container is Running and the port is listening (check the Ports section). Third, kubectl run a debug pod (busybox, netshoot) and try to curl the Service ClusterIP directly. If that fails, you're looking at a CNI or NetworkPolicy issue. Fourth, check kube-proxy logs on the node where the debug pod ran. Most Service issues are step one (empty endpoints from a label typo).
Kubernetes troubleshooting interview questions (10 Q)
This is the bucket that decides most interviews. Scenario format. Your pod is X, what do you check first. The pattern that works: state the most likely cause, name the kubectl command you'd run to confirm, state what output would point at the actual cause. Drill this section hardest.
Q19. Your pod is in CrashLoopBackOff. Walk me through your debug.
CrashLoopBackOff means the container started, crashed, restarted, crashed again, and the kubelet is now backing off the restart attempts (10s, 20s, 40s, doubling up to 5 minutes). Workflow: kubectl describe pod first, read the Events section for OOMKilled, failed probes, or image errors. Then kubectl logs --previous (the current container hasn't started, so plain kubectl logs returns nothing). Common causes: application bug on startup, missing env var or ConfigMap reference, liveness probe firing too early during slow boot, OOMKill from too-low memory limits, wrong command or args.
Q20. Your pod is in ImagePullBackOff. What's wrong?
The kubelet can't pull the container image. Three common causes. The image name is wrong (typo, wrong registry, wrong tag). The image is in a private registry and the Pod has no imagePullSecret configured. The registry is rate-limiting or down. kubectl describe pod shows the exact pull error in Events. The fix depends on the cause, but the first move is always describe.
Q21. Your pod is in Pending state. Why?
Three buckets. Resource buckets: no node has enough CPU or memory to fit the pod's requests. Fix is smaller requests, more nodes, or cluster autoscaler. Scheduling buckets: the pod has affinity, anti-affinity, or nodeSelector rules that no node satisfies, or it tolerates a taint that no node has. Fix is matching scheduling rules to actual node labels. PVC buckets: pod is waiting on a PersistentVolumeClaim that hasn't bound, usually because no StorageClass is set or no PersistentVolume is available. Fix is kubectl get pvc and reading its Events. kubectl describe pod shows the scheduler's reasoning at the bottom.
Q22. Your pod is OOMKilled. What does that mean and how do you fix it?
OOMKilled means the container exceeded its memory limit and the kernel killed it. Two fixes. Raise the memory limit if the workload legitimately needs more. Or find and fix the memory leak in the application. kubectl describe pod shows OOMKilled in Last State under Containers; kubectl top pod (if metrics-server is installed) shows current usage. The senior follow-up: whether you'd set memory limits at all (some teams don't, preferring node-level pressure) and how you'd pick the value.
Q23. Your pod won't start because of a CreateContainerConfigError. What's wrong?
The Pod spec references something that doesn't exist or is malformed. Most common cause: a referenced ConfigMap or Secret doesn't exist or doesn't have the expected key. kubectl describe pod shows the exact error in Events ("couldn't find key X in ConfigMap Y"). Fix is creating the missing resource or fixing the reference in the Pod spec.
Q24. Your service is up but no traffic is reaching the pods. Walk through your debug.
kubectl get endpoints first. If empty, the Service selector doesn't match any Pods, almost always a label typo on the Pod or the Service. If endpoints exist but traffic still doesn't reach pods, kubectl describe pod on one Pod and confirm the container is Running and listening on the right port. Then exec into a debug pod and curl the Service ClusterIP. If that fails, check the CNI plugin and any NetworkPolicy. If kube-proxy is down on the node, all Services on that node break, kubectl get pods -n kube-system to check.
Q25. DNS isn't resolving inside the cluster. What's the first thing you check?
kubectl get pods -n kube-system | grep coredns. If CoreDNS pods are not Running, that's the entire cluster's DNS broken. kubectl describe and kubectl logs on the CoreDNS pods to find the root cause (configuration error, OOMKill, node failure). If CoreDNS is healthy, exec into a problem pod and try nslookup against the cluster DNS service IP (usually 10.96.0.10 or whatever your CoreDNS service IP is). If that fails, you're looking at a NetworkPolicy blocking DNS or a CNI issue.
Q26. Your Deployment rolled out and now the app is broken. How do you roll back?
kubectl rollout undo deployment/(name). That reverts to the previous ReplicaSet immediately. kubectl rollout history deployment/(name) shows the revisions if you need to go back further than one. The interview value is knowing rollback is a first-class kubernetes operation that doesn't require redeploying old YAML.
Q27. Your pod's liveness probe is failing during startup. How do you fix it?
Two options. Loosen the liveness probe so it doesn't fire until the app has bootstrapped (raise initialDelaySeconds, increase failureThreshold). Or, better, add a startup probe (kubernetes 1.16+) that gates the liveness probe entirely until the application has reported ready. Startup probe runs to success or timeout; liveness probe only starts after startup probe succeeds. This is the cleanest fix for slow-starting applications.
Q28. Your node is in NotReady state. What now?
NotReady means the kubelet on that node is not reporting healthy to the control plane. kubectl describe node shows the Conditions and the last heartbeat. Common causes: kubelet crashed, node out of disk pressure, network partition, container runtime crashed. SSH to the node and check kubelet logs and disk usage. Pods enter Terminating after a tolerationSeconds window (default 5 minutes).
Kubernetes security plus RBAC interview questions (5 Q)
Security depth is shallow at the new-grad level. The interviewer wants to see you know what RBAC is and that you wouldn't expose the dashboard publicly without auth.
Q29. What's RBAC in kubernetes?
Role-Based Access Control. The mechanism kubernetes uses to decide who can do what against the API. Four resources matter. A Role grants permissions within a single namespace ('can read pods in the dev namespace'). A ClusterRole grants permissions cluster-wide ('can read pods in any namespace'). A RoleBinding attaches a Role to a user, group, or ServiceAccount. A ClusterRoleBinding does the same for a ClusterRole. The new-grad mistake is conflating users (human, external) with ServiceAccounts (in-cluster, pod-bound).
Q30. What's a ServiceAccount?
A kubernetes-native identity for pods, used when a pod itself needs to talk to the kubernetes API. Every pod runs as some ServiceAccount (default if not specified). When the pod calls the API, kubernetes authenticates the call using a token mounted into the pod from the ServiceAccount. The interview-relevant case: a CI/CD agent pod that needs to deploy other workloads runs as a ServiceAccount with a RoleBinding granting it the right permissions. The new-grad trap is granting that ServiceAccount cluster-admin (too broad). Always grant the minimum permissions needed.
Q31. How would you give a pod read-only access to ConfigMaps in its namespace?
Three resources. A Role in that namespace with rules: get, list, watch on configmaps. A ServiceAccount in that namespace. A RoleBinding in that namespace binding the Role to the ServiceAccount. Then set the Pod spec's serviceAccountName to the new ServiceAccount. The Pod can now read ConfigMaps via the kubernetes API. Knowing the three-resource pattern (Role plus ServiceAccount plus RoleBinding) is the credibility signal.
Q32. What's a Pod Security Standard?
A built-in policy that restricts what Pods can do (run as root, mount the host filesystem, use host networking, run privileged containers). Three levels: privileged (no restrictions), baseline (block known-bad), restricted (hardened). Applied at the namespace level via labels. Replaces the older PodSecurityPolicy (deprecated in 1.21, removed in 1.25). The interview value is just knowing the levels exist and what they do.
Q33. How would you store a database password securely in kubernetes?
Three options ordered by hardening. (1) A Secret resource, base64-encoded, with encryption-at-rest enabled on the cluster's etcd. Acceptable for low-stakes environments. (2) Sealed-secrets or external-secrets-operator, which encrypts the actual secret with a cluster-only key, so the encrypted blob is safe to commit to git. Better. (3) An external secret manager (Vault, AWS Secrets Manager, GCP Secret Manager) with a CSI driver that injects secrets at pod startup, so they're never stored in etcd at all. Best for production. The new-grad bar is just knowing Secrets-base64 is encoding, not encryption.
kubectl interview questions (5 Q)
Less about explaining the commands, more about typing them correctly under pressure. The interviewer asks scenarios; you respond with the right kubectl invocation.
Q34. What's the difference between kubectl apply and kubectl create?
kubectl create is imperative: create a new resource, fail if it already exists. kubectl apply is declarative: ensure the cluster matches this manifest, create if missing, update if different, merge with existing changes via a three-way diff. Production GitOps workflows use apply almost universally. The new-grad mistake is using kubectl create in a CI pipeline that runs twice.
Q35. How do you see the logs of a previously-crashed container?
kubectl logs (pod) --previous. The current container hasn't started (it's in CrashLoopBackOff), so plain kubectl logs returns nothing useful. The --previous flag shows the output of the last container that ran. If the pod has multiple containers, add -c (container-name).
Q36. How do you exec into a running container?
kubectl exec -it (pod) -- /bin/sh. The double-dash separates kubectl flags from the command being run inside the container. Use /bin/sh for minimal images (alpine, distroless) and /bin/bash for fuller images. Add -c (container-name) if the pod has multiple containers. The -it flags make it interactive with a TTY.
Q37. How do you watch events in a cluster sorted by time?
kubectl get events --sort-by=.lastTimestamp. Events are how kubernetes surfaces what just happened. Pod scheduled, image pulled, container crashed, probe failed. Without --sort-by, events are returned in essentially random order (UID order), which is unreadable for debugging. The --sort-by=.lastTimestamp variant is one of the most useful commands in the entire kubectl toolbox.
Q38. How do you scale a Deployment?
Two ways. kubectl scale deployment (name) --replicas=5 for an imperative one-shot. Or edit the Deployment YAML and re-apply, which is the GitOps-friendly way. Production teams almost always use the YAML edit path. The imperative kubectl scale is fine for quick ops during an incident but disagrees with the cluster's declared state, so the next GitOps reconciliation will revert it.
Kubernetes scenario-based interview questions (5 Q)
Architectural design at small-to-medium scale. The interviewer asks you to design something, you walk through the kubernetes resources you'd combine and the tradeoffs you'd make. New-grad versions stay small. Senior versions push into multi-cluster and multi-region.
Q39. Design a small e-commerce backend on kubernetes.
Three Deployments fronted by Services: a stateless API gateway, a stateless product-catalog service, a stateless order-processing worker. One StatefulSet for Postgres with a PersistentVolumeClaim. A ConfigMap for non-secret config (Postgres host, log level), a Secret for the Postgres password and any third-party API keys. An Ingress routing api.example.com to the gateway Service. A HorizontalPodAutoscaler on the gateway and order-processing Deployments scaling on CPU. NetworkPolicies restricting database traffic to only the order-processing pods. One stated tradeoff: Postgres in StatefulSet for simplicity vs managed Postgres outside the cluster for HA and easier ops. Most production setups choose the latter at scale.
Q40. Design a daily batch job that processes 100GB of logs into a database.
A CronJob scheduled at "0 2 * * *". The Pod streams logs from object storage in chunks (avoid loading 100GB into memory), writes aggregates to the database via a Secret-supplied connection string. Realistic resource requests (4 CPU, 8GB memory), an activeDeadlineSeconds to prevent runaway jobs, a history limit so old Job records don't pile up. One stated tradeoff: serial single-Pod processing vs Job with parallelism>1 sharding the work. Sharding is faster but adds coordination overhead; only worth it for jobs taking more than 30 minutes serially.
Q41. Design a real-time chat backend on kubernetes.
A Deployment for the stateless WebSocket gateway (3+ replicas, Service with sessionAffinity: ClientIP so a user's WebSocket sticks to one pod). A StatefulSet for Redis as pub-sub. A Deployment for the message persistence worker. Managed database or Postgres StatefulSet for durable message history. Ingress with WebSocket support. HorizontalPodAutoscaler on the gateway scaling on connection count rather than CPU. One tradeoff: in-cluster Redis vs managed Redis. Managed wins on ops, in-cluster wins on cost and latency.
Q42. Design a CI/CD pipeline running on kubernetes.
A Deployment for the CI/CD controller. A ServiceAccount with a RoleBinding granting permissions to deploy workloads. Build agents as Jobs spun up per-build, each in a dedicated namespace with resource quotas. Secrets for git, registry, and deployment-target credentials. A NetworkPolicy restricting build agents from talking to anything outside the registry, the git host, and the controller. Tradeoff: per-build Jobs (clean, slow) vs persistent build-agent Pods (fast, cross-build contamination risk). Most teams choose per-build Jobs.
Q43. Design a feature-flag service that needs sub-100ms reads.
A Deployment for the read service with 5+ replicas behind a ClusterIP Service. Local in-memory cache in each replica refreshed every 10s from a backing store. A separate StatefulSet for the backing store (Redis). An admin Deployment for writes, behind a different Service. NetworkPolicy isolating reads from write traffic. Tradeoff: cache freshness vs read latency. 10s refresh gives eventual consistency on flag changes, fine for most flags. For instant propagation use a pub-sub channel.
How to prepare for kubernetes interview questions as a CS new grad
The HowTo schema attached to this article has the full week-by-week plan. The short version with the rationale that actually matters:
Hands-on beats reading. Install minikube or kind on your laptop. Deploy a two-tier app (Deployment plus Postgres StatefulSet plus Service plus ConfigMap plus Secret). Write the YAML by hand, not from a tutorial. That single weekend teaches more than three chapters of docs because you hit the moment where you realize you don't know whether the selector goes inside spec or inside spec.template, and that moment is the prep.
Break it on purpose. Once the app works, mistype a Service selector and watch the Endpoints disappear. Set a CPU limit so low the pod gets OOMKilled. Reference a ConfigMap that doesn't exist. Each failure mode you debug on your laptop is a scenario question you will pass cold in an interview. This is the single most underrated prep activity. Most new grads study by reading. The ones who pass scenario questions study by breaking.
Memorize the 15 commands. kubectl get pods, describe pod, logs (with --previous and -c), exec, port-forward, apply, delete, edit, scale, rollout (status, undo, history), top, get events --sort-by=.lastTimestamp. These are the commands you will type during a live troubleshooting question. The interviewer is grading muscle memory.
Read three postmortems. Search 'kubernetes postmortem' on Hacker News or large infrastructure-company engineering blogs. Read three end-to-end. The senior signal interviewers grade is whether you can talk about a failure mode with the vocabulary of someone who has read the incident reports. You have not run a production cluster. That is fine. You have read about people who did, and that is the move.
Run three timed mocks. One architecture mock, one scenario-only mock, one mixed mock that mirrors the target company's format. AI mock interview tools catch vocabulary slips in real time. Human peer mocks deliver the pressure of being watched. Use both.
Kubernetes interview format by role
Same kubernetes content gets tested differently across the five roles asking these questions in 2026.
| Role | Architecture depth | Networking depth | Troubleshooting depth | Security / RBAC depth | YAML writing | Adjacent skills |
|---|---|---|---|---|---|---|
| DevOps engineer (junior) | High | Medium-High | High (scenario-heavy) | Medium | High | CI/CD, Helm, Docker |
| SRE (junior) | Medium-High | Medium | Very High (incident focus) | Medium | Medium | Observability, on-call, SLOs |
| Platform engineer (junior) | Very High | High | Medium | Medium-High | Very High | Helm, operators, CRDs |
| Software engineer (K8s tag) | Medium | Low | Low-Medium | Low | Medium | Application-level concerns |
| Cloud engineer (junior) | Medium | Medium | Medium | Medium | Medium | EKS / GKE / AKS specifics |
Two patterns. Troubleshooting depth tracks production responsibility (SRE-junior gets the heaviest scenario questions, SWE-with-K8s-tag the lightest). YAML-writing depth tracks how much the role builds the platform vs uses it. If you don't know which role you're calibrating to, prep for DevOps depth, that covers the others.
Common kubernetes interview mistakes for CS new grads
The seven most-reported mistakes from new-grad kubernetes interviews in the 2025-2026 hiring cycle, in roughly the order of frequency.
1. Naming concepts without showing they've used them. A candidate who can recite 'a Service abstracts a set of pods behind a stable virtual IP' but who would not know how to debug a Service that's not routing traffic reads as someone who watched a video, not someone who ran kubectl. The fix is hands-on. Install minikube, write a Deployment YAML by hand, break a Service by mistyping a selector label, fix it by running kubectl get endpoints and noticing the empty list. That single experience teaches more than three chapters of reading.
2. Confusing Deployments and StatefulSets. New grads often default to Deployment because it's what every tutorial uses. The interview probe: 'would you use a Deployment or a StatefulSet for Postgres'. The candidate who says Deployment without flinching has not thought about stable identity or persistent storage. The right answer is StatefulSet with reasoning attached.
3. Treating Secrets as actually secret. Base64 is encoding. The new-grad assumption that Secrets are encrypted-by-default falls apart the moment the interviewer asks 'what protects this Secret from a developer with cluster read access'. Real answer: encryption-at-rest on etcd, external secret managers for high-stakes material.
4. Skipping --previous in kubectl logs. A pod is in CrashLoopBackOff. The candidate runs kubectl logs, gets nothing, freezes. The --previous flag shows the logs of the crashed container. Knowing this one flag passes a question that otherwise stalls the whole interview.
5. Naming kube-proxy as if it's a separate cluster component you'd never touch. kube-proxy programs the iptables rules that make Services work. If kube-proxy is down on a node, all Services break on that node. The candidate who treats kube-proxy as a black box can't debug the resulting outage.
6. Not knowing where to look for events. Most kubernetes failures (image pulls, scheduling, probes, OOMKills) surface in the Events section of kubectl describe pod, or in kubectl get events sorted by .lastTimestamp. Candidates who don't know to look there waste 5 minutes naming commands that don't help.
7. Skipping the YAML. Reading about Deployments without writing one means you don't know whether spec.selector.matchLabels matches spec.template.metadata.labels by convention or by enforcement (it's by enforcement, as of kubernetes 1.16+). The candidate who has written 10 Deployment YAMLs knows. The one who has only read about them doesn't.
One thing I'd add from watching new grads work through this: don't try to memorize all seven. Pick the two that match your weakest area (most likely #1 and #4) and build a weekend exercise around fixing them. Spin up minikube on a Saturday morning, deploy a broken app on purpose, debug it. The rest take care of themselves once the muscle memory is in place.
Key terms
- Pod / Deployment / ReplicaSet / StatefulSet / DaemonSet
- Pod is the smallest deployable unit, one or more containers on a single node. Deployment is the higher-level controller for stateless workloads, managing rolling updates via a ReplicaSet (which maintains the replica count). StatefulSet is for workloads needing stable identities and persistent per-pod storage (databases, leader-election systems). DaemonSet ensures one Pod runs on every matching node (log collectors, metrics agents).
- Service / Ingress / NetworkPolicy
- Service is a stable virtual IP and DNS name fronting a dynamic set of Pod IPs, implemented by kube-proxy via iptables or IPVS rules. Ingress is a higher-level HTTP routing layer behind a single LoadBalancer, with the actual rules implemented by an ingress controller (nginx, traefik). NetworkPolicy restricts pod-to-pod traffic at the network layer, opt-in and CNI-dependent.
- ConfigMap / Secret
- Key-value stores mounted into pods as env vars or files. ConfigMaps for non-sensitive config (URLs, feature flags). Secrets for sensitive data (API keys, passwords), base64-encoded by default (encoding, not encryption). Production clusters add encryption-at-rest for Secrets in etcd and often use an external secret manager for the actual sensitive material.
- CrashLoopBackOff / ImagePullBackOff / OOMKilled / Pending
- The four pod-status values you'll see most in debug scenarios. CrashLoopBackOff: container starts, crashes, restarts in an increasing backoff. ImagePullBackOff: kubelet can't pull the image (typo, missing imagePullSecret, registry down). OOMKilled: container exceeded its memory limit, kernel killed it. Pending: pod can't be scheduled (no node has resources, scheduling rules don't match, or PVC unbound).
- RBAC: Role / RoleBinding / ClusterRole / ClusterRoleBinding / ServiceAccount
- Role-Based Access Control. Role grants permissions in a single namespace. ClusterRole grants permissions cluster-wide. RoleBinding attaches a Role to a user, group, or ServiceAccount in a namespace. ClusterRoleBinding does the same cluster-wide. ServiceAccount is the kubernetes-native identity for pods themselves (used when a pod calls the kubernetes API).
- CNI / CRI / kubelet / kube-proxy / etcd
- The control-and-data plane vocabulary. CNI (Container Network Interface) is the plugin model for pod networking (Calico, Cilium, Flannel). CRI (Container Runtime Interface) is the plugin model for container runtimes (containerd, CRI-O). kubelet is the agent running on every node that watches the API server and starts the right pods. kube-proxy is the per-node component that programs iptables or IPVS rules to make Services work. etcd is the distributed key-value store holding the entire cluster state.
Related guides
The kubernetes interview is one of several technical interviews a CS new grad targeting DevOps or platform roles will face in 2026. The following guides close adjacent gaps.
- System design basics for new grads: the system-design round often pairs with kubernetes questions for platform-engineer and SRE roles. The framework transfers.
- Python interview questions: most DevOps and SRE roles also test Python at a script-and-tooling level. Light depth, broad surface.
- Data engineer interview questions: the data engineering round shares the YAML-and-pipeline vocabulary with kubernetes interviews. Skim if you're considering both pivots.
- Technical phone screen tactics: the phone-screen format is similar across SWE, DevOps, and SRE pipelines. Content differs (more kubectl, less LeetCode), structure is the same.
- Mock interview practice methodology: the four-mode approach (solo, peer, paid, AI) applies to kubernetes prep especially well because scenario questions benefit from live narration practice.
Pick the gap, jump to the matching cornerstone, close it, return to kubernetes prep. That's the loop.
The kubernetes interview in 2026 favors candidates who have actually used a cluster over candidates who have only read about one. Two weekends with minikube closes most of the gap. Three weeks of focused prep (hands-on, docs, scenario drills, postmortems, mocks) makes you interview-ready for DevOps-junior, SRE-junior, and platform-engineer-junior pipelines. Less LeetCode pressure than a pure SWE loop, more entry-level openings on the platform side as companies grow their infrastructure teams.
InterviewChamp.AI runs realistic kubernetes mocks that show up on every interview surface: the live troubleshooting whiteboard, the architecture-design conversation, the kubectl scenario round. One install, every surface. Start a practice session, narrate your debug as you work through the scenario, and walk into Monday's phone screen ready.
About the author: Alex Chen is the founder of InterviewChamp.AI, building AI interview prep for the new-grad CS market and writing about the modern interview gauntlet from the inside.
Related guides
System Design Interview Guide for CS New Grads (2026): Framework, Templates, Cheat Sheet
The new-grad system design interview is a vocabulary check, a structure check, and a communication check, not a senior architect evaluation. This guide gives you a 4-step framework, a 12-template cheat sheet, a 45-minute time budget, the five canonical problems that carry 80% of new-grad rotations, and a side-by-side of HLD vs LLD vs machine-learning-system-design. Built for the CS new grad who has solved 600 LeetCode problems but never drawn a load balancer.
Alex Chen ·
Read more →The 2026 CS New-Grad Interview Loop: Phone Screen to Offer at Every Tier
The 2026 CS new-grad interview loop runs five steps (recruiter screen, technical screen, onsite, debrief, offer) but the shape of each step now depends on tier of company. This guide maps the loop for FAANG, mid-tier public, startup, consultancy, and research lab, with 2026 timelines and how AI-fraud concerns brought in-person rounds back.
Alex Chen ·
Read more →Accounting Interview Questions for 2026: 40+ Questions for Staff Accountants, Big 4 Candidates, and CPA Pivots
Accounting interview questions in 2026 test six things at once: do you know GAAP cold, can you walk a transaction from journal entry to the three financial statements, can you read a balance sheet under pressure, do you understand the difference between Big 4 audit and corporate close work, can you handle the behavioral round without sounding rehearsed, and can you reason through a case study when the prompt is intentionally vague. If you're an accounting grad, a CPA candidate, or pivoting from finance/ops into staff accountant work, the technical bar isn't the killer. It's framing what you know in 60 seconds while a senior manager watches you on Zoom. This guide walks 40+ questions across six categories, the Big 4 vs corporate vs public-accounting split, and the four-week prep plan that actually works.
Alex Chen ·
Read more →Frequently asked questions
- What kubernetes interview questions should I prepare for in 2026?
- Prepare across five buckets: architecture (Pod, Deployment, StatefulSet, DaemonSet, Service, Ingress, ConfigMap, Secret), networking (CNI, Service types, NetworkPolicy, kube-proxy, DNS, Ingress vs LoadBalancer), troubleshooting (CrashLoopBackOff, Pending, ImagePullBackOff, OOMKilled, DNS resolution failures, kubectl describe workflow), security plus RBAC (Roles, RoleBindings, ServiceAccounts, Pod Security), and kubectl command fluency (the 10-15 commands you'll actually type during a live troubleshooting question). Scenario-format questions (your pod is X, what do you check first) carry the most weight at the new-grad bar because they screen for whether you've actually used kubectl on a real cluster or just read about it.
- What's the difference between a Pod and a Deployment in kubernetes?
- A Pod is the smallest deployable unit in kubernetes, one or more containers sharing network and storage, running on a single node. A Deployment is a higher-level controller that manages a set of identical Pods through a ReplicaSet, handling rolling updates, rollbacks, and the desired-state-vs-actual-state reconciliation loop. You almost never create raw Pods in production. You create a Deployment, the Deployment creates a ReplicaSet, the ReplicaSet creates Pods. The exception is one-shot Jobs or system-level workloads where the lifecycle is intentional.
- What is a StatefulSet and when should I use one over a Deployment?
- A StatefulSet is the workload type for applications that need stable network identities, ordered startup and shutdown, and persistent per-pod storage. Examples: databases (Postgres, MongoDB), distributed systems with leader election (Kafka, etcd), and anything where pod-1 must always be pod-1 with the same hostname and the same volume. Use a Deployment when your pods are interchangeable (stateless web servers, API workers). Use a StatefulSet when each pod has its own identity. The interview-relevant signal is whether you state the criterion (stable identity plus persistent storage) rather than parroting 'StatefulSet for stateful apps.'
- What is CrashLoopBackOff and how do you debug it?
- CrashLoopBackOff is the status kubernetes assigns when a container starts, crashes, restarts, crashes again, and the kubelet starts backing off the restart attempts (10s, 20s, 40s, doubling up to 5 minutes). The fix workflow: run kubectl describe pod to see the recent events, then kubectl logs --previous to see the output of the crashed container (the current container hasn't started, so kubectl logs alone returns nothing useful). Common causes: application bug crashing on startup, missing environment variable or ConfigMap, failed liveness probe, OOM kill (check the Events for OOMKilled), wrong command or args. The interview value is showing the workflow, not just naming a cause.
- What is the difference between a Service ClusterIP, NodePort, and LoadBalancer?
- ClusterIP is the default, an internal-only virtual IP reachable from inside the cluster, used for service-to-service communication. NodePort exposes the service on a static port on every node's IP, useful for development and bare-metal clusters without a cloud LoadBalancer. LoadBalancer provisions an external cloud load balancer (ELB on AWS, an external IP on GCP) that fronts the service, the standard production pattern for exposing HTTP services to the internet. The follow-up question is usually Ingress, a higher-level construct that lets you route many hostnames or paths through a single LoadBalancer using an ingress controller (nginx, traefik).
- What is the difference between kubectl apply and kubectl create?
- kubectl create is imperative: create a new resource, fail if it already exists. kubectl apply is declarative: ensure the cluster matches this manifest, create if missing, update if different, merge with existing changes via a three-way diff (last-applied annotation, current cluster state, new manifest). Production GitOps workflows are almost always declarative (kubectl apply, or higher-level tools like Argo CD or Flux). The new-grad mistake is using kubectl create in a CI/CD pipeline, which fails on the second run.
- What is a ConfigMap and how is it different from a Secret?
- Both are key-value stores that pods mount as environment variables or files. ConfigMaps hold non-sensitive configuration (feature flags, URLs, log levels). Secrets hold sensitive data (API keys, database passwords, TLS certs) and are stored base64-encoded by default. Base64 is encoding, not encryption, so anyone with cluster read access can decode a Secret. Production clusters enable encryption-at-rest for Secrets and often layer in an external secret manager (sealed-secrets, Vault, cloud KMS) for the actual sensitive material.
- How does kubernetes networking work?
- Three rules. First, every pod gets its own IP address from a flat cluster network, no NAT required for pod-to-pod traffic. Second, pods on the same node and pods on different nodes communicate the same way (the network is flat from the pod's perspective). Third, the Service abstraction (ClusterIP, NodePort, LoadBalancer) provides a stable virtual IP and DNS name in front of a dynamic set of pod IPs. The actual implementation lives in the CNI plugin (Calico, Cilium, Flannel, Weave) which is installed at cluster bootstrap and handles the routing. kube-proxy programs iptables or IPVS rules to route Service IPs to pod IPs.
- What is a NetworkPolicy in kubernetes?
- A NetworkPolicy is a kubernetes resource that restricts which pods can talk to which other pods at the network layer. Without policies, all pods in a cluster can reach all other pods by default (a flat network is permissive). A NetworkPolicy lets you write rules like 'pods in the frontend namespace can reach pods in the backend namespace on port 8080, nothing else.' NetworkPolicies require a CNI plugin that supports them (Calico, Cilium yes; Flannel out-of-the-box no). The interview signal is knowing that NetworkPolicy is opt-in and CNI-dependent, not a built-in kubernetes feature in the way Services are.
- How do I prepare for a kubernetes interview as a CS new grad?
- Three weeks of focused work. Week 1: install minikube or kind locally, deploy a simple two-tier web app (frontend Deployment plus Postgres StatefulSet plus a Service plus a ConfigMap plus a Secret), break it deliberately and fix it. Week 2: read the kubernetes documentation cover-to-cover for Pods, Deployments, Services, ConfigMaps, Secrets, RBAC. Memorize the 15 kubectl commands you'll use in a live troubleshooting question. Week 3: drill scenario questions out loud (your pod is CrashLoopBackOff'ing, what do you check), read 3-5 public postmortems from companies that had production kubernetes incidents, run timed mock interviews. The single biggest accelerant is hands-on practice. Knowing what kubectl describe pod outputs in a real failure is worth ten hours of reading.
- What is RBAC in kubernetes?
- Role-Based Access Control. The mechanism kubernetes uses to decide who can do what against the API. Four resources matter. A Role grants permissions within a single namespace ('can read pods in the dev namespace'). A ClusterRole grants permissions cluster-wide ('can read pods in any namespace'). A RoleBinding attaches a Role to a user, group, or ServiceAccount. A ClusterRoleBinding does the same for a ClusterRole. ServiceAccounts are the kubernetes-native identity for pods. The new-grad mistake is conflating users (human, external) with ServiceAccounts (in-cluster, pod-bound).
- What is etcd and why does it matter?
- etcd is the distributed key-value store that holds the entire kubernetes cluster state, every Pod spec, every Service, every Secret, every Node registration. It runs as a separate process (or cluster of processes for HA) on the control plane nodes. The kubernetes API server talks to etcd; nothing else does. The interview-relevant facts: etcd uses the Raft consensus algorithm for HA, losing quorum (more than half the etcd nodes) means the cluster API stops working, and etcd backups are how you recover from a control-plane disaster. The new-grad version of this question is usually just 'what's etcd', the senior version probes the consensus and HA details.
- What does kubectl describe pod actually tell you?
- Six things in roughly this order. First, the pod's labels and annotations (useful for confirming you're looking at the right pod). Second, the node it's scheduled on (or Pending if unscheduled). Third, the IP address and pod-IP (useful for networking debugging). Fourth, each container's state (Running, Waiting, Terminated) and the last 5 reasons for state changes. Fifth, volumes mounted into the pod. Sixth, the Events list at the bottom, which is the single most useful debugging signal in kubernetes (image pull failures, scheduling failures, probe failures, OOMKills, all surface here). When someone says 'check kubectl describe pod first', they mean read the Events list at the bottom.
- What is the difference between liveness and readiness probes?
- Both are health checks kubernetes runs against your container. A liveness probe answers 'is this container still alive', if it fails repeatedly the kubelet kills and restarts the container. A readiness probe answers 'is this container ready to serve traffic', if it fails the pod is removed from Service endpoints but the container is not killed. The classic mistake is conflating them. Setting a liveness probe that fails during slow startup causes endless restart loops; the fix is a startup probe (kubernetes 1.16+) that gates the liveness probe until the application has bootstrapped.
- Why does my pod stay in Pending state?
- Three buckets of causes. Resource buckets: no node has enough CPU or memory to fit the pod's requests. The fix is either smaller requests, more nodes, or cluster autoscaler. Scheduling buckets: the pod has affinity, anti-affinity, or nodeSelector rules that no node satisfies, or it tolerates a taint that no node has. The fix is matching the scheduling rules to the actual node labels. PVC buckets: the pod is waiting on a PersistentVolumeClaim that hasn't bound yet, often because no StorageClass is set or no PersistentVolume is available. The fix is checking kubectl get pvc and looking at the Events. kubectl describe pod shows the scheduler's reasoning at the bottom of Events.
- What's the most common kubernetes interview mistake CS new grads make?
- Naming concepts without showing they've used them. A candidate who can recite 'a Service abstracts a set of pods behind a stable virtual IP' but who would not know how to debug a Service that's not routing traffic reads as someone who watched a video, not someone who ran kubectl. The fix is hands-on: install minikube, write a Deployment YAML by hand, break a Service by mistyping a selector label, fix it by running kubectl get endpoints and noticing the empty list. That single experience teaches more than three chapters of reading.