Skip to main content

Data Scientist Interview Questions for 2026: 40+ Questions Across SQL, Stats, A/B Testing, Product Sense (DS Pivot Edition)

Data scientist interviews in 2026 test five distinct skill stacks: SQL fluency, Python/pandas, statistics and probability, A/B testing and experimentation, and product sense. The LeetCode bar is lower than a SWE loop, but the surface area is wider. This guide covers 40+ questions across seven categories, the new-grad pivot story for CS students with strong stats coursework, and the format differences between Data Scientist, Data Analyst, and ML Engineer interviews.

By Alex Chen, Founder, InterviewChamp.AI · Last updated

31 min read

What data scientist interview questions test in 2026

Data scientist interviews in 2026 test five stacks in roughly this order of frequency: SQL fluency at the window-function level, Python and pandas at a data-manipulation level, statistics and probability at an undergraduate-course-plus-application level, A/B testing and experimentation as the signature topic of consumer-tech DS interviews, and product sense as the structured-reasoning round. Case studies show up at almost every company in some form. The pure ML round has shrunk since 2024 at most companies but still appears at FAANG and ML-product-heavy employers.

The bar is wide and shallow. You need to be functional across SQL, stats, Python, A/B testing, and product reasoning. You don't need to be a researcher in any of them. That makes the DS loop friendlier than a SWE loop for a CS new grad with stats coursework and a Kaggle competition under their belt, and harder than a data analyst loop because the experimentation and ML surface is wider.

Distribution of question types most new-grad DS candidates report seeing in their 2026 loops:

  • 25-30% SQL (always at least one whiteboard or shared-screen round)
  • 20-25% statistics and probability
  • 20% A/B testing and experimentation (higher at consumer-tech)
  • 15% product sense and case studies
  • 10% Python and pandas
  • 5-10% machine learning fundamentals
  • Rarely more than one LeetCode-style algorithm round outside FAANG

The interview is shorter than a FAANG SWE loop. Most onsite loops run 4-5 hours with one virtual round before. The format is meaningfully more conversational. The interviewer wants to see whether you can think structurally about ambiguous problems out loud, not whether you can produce optimal code under a clock.

Honest call: if your CS degree included one stats course and one ML course that you understood, plus one Kaggle competition where you finished above the median, you're closer to interview-ready than you think. The DS pivot from CS is the shortest of the data-career pivots because the math foundations transfer directly.

Should a CS new grad pivot from SWE to data scientist roles?

For most CS new grads applying since spring 2025 with no SWE offers, the honest answer is: if you have strong stats coursework and at least one project that signals data-driven thinking, the DS pivot is worth a serious look. The pipeline is meaningfully less saturated than pure SWE at entry level. Class-of-2025 grads pivoting into DS in late 2025 reported faster time-to-offer than the same cohort's SWE search, according to r/datascience and r/cscareerquestions Q1 2026 megathreads.

But "less saturated" doesn't mean "easier." It means "different." Three honest reads before committing to the pivot:

Read 1: The math bar is real. If you took one stats course and forgot most of it, the pivot takes longer. Two-to-three weeks just to refresh the foundations. If you took stats, probability, linear algebra, and ML and remember them well, the pivot takes 3-4 weeks. Don't pivot into DS assuming you can fake the stats. The interview will catch you within one phone screen.

Read 2: A/B testing is the unique surface. SWE interviews don't test A/B testing. DS interviews at consumer-tech companies test it heavily. Sample-size calculation, multiple-comparisons, sequential-testing pitfalls. None of this shows up in a SWE loop. You'll spend a full week of prep just on experimentation if you're targeting consumer-tech roles.

Read 3: The pivot is honest if your portfolio shows it. What's not honest: rewriting a SWE-only resume to claim DS experience you don't have. What is honest: positioning yourself as a CS grad with strong stats coursework and a Kaggle project who has been studying A/B testing and product analytics to build the DS foundation. Recruiters detect the dishonest framing within one phone screen. The honest framing works when the portfolio backs it.

A specific case I'd flag from the 2025-2026 cycle: a CS grad with 487 applications, 14 interviews, and zero SWE offers across 11 months. He took two stats courses in undergrad and finished above the median in one Kaggle competition. Three weeks of structured DS prep (one week SQL, one week stats refresh, one week A/B testing) plus a small portfolio repo, and he landed the first DS phone screen of his pivot. The pivot works when the underlying math is real. It doesn't when the candidate is hoping the interviewer won't ask about confidence intervals.

If those three reads land favorably, the pivot is worth 30 days of focused prep. The rest of this guide walks the questions you'll face, by category, plus the week-by-week study plan.

The 40+ data scientist interview questions you should rehearse

What follows is the structured rehearsal set covering the seven categories that show up most. Each question has a sample answer outline. Not a full canned response, but the bones of what a passing answer at the new-grad bar covers. Adapt the language to your own voice. The structure is the load-bearing part.

SQL interview questions for data scientists (8 Q)

SQL is graded harder than most candidates expect. The window-function bar is universal at the entry level. Cohort and retention queries are the DS signature.

Q1. Find the 7-day retention rate by signup cohort.

Sample answer: Two-CTE query. First CTE: extract users with their signup date. Second CTE: extract user activity dates. Join: for each user, check whether they had activity within day 7-13 after signup (the 7-day retention window). Aggregate: group by signup_date, count distinct users who retained, divide by total cohort size. The harder follow-up: do the same for 1-day, 7-day, 30-day retention in a single query using a CASE WHEN per window.

Q2. Compute weekly active users (WAU) over the last 12 weeks.

Sample answer: SELECT DATE_TRUNC('week', event_date) AS week, COUNT(DISTINCT user_id) AS wau FROM events WHERE event_date >= CURRENT_DATE - INTERVAL '12 weeks' GROUP BY 1 ORDER BY 1. The follow-up: rolling 7-day WAU, which requires a self-join or a window function with a frame specification. Use COUNT(DISTINCT user_id) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW).

Q3. Find the top 3 products by revenue in each region.

Sample answer: Use DENSE_RANK() OVER (PARTITION BY region ORDER BY revenue DESC) in a CTE, filter to rank <= 3. The trap is reaching for a GROUP BY region, MAX(revenue) pattern. That works for top 1 but breaks for top N. The window-function pattern generalizes.

Q4. Compute the funnel conversion rate from signup to first purchase.

Sample answer: Multi-CTE. First CTE: signup events with user_id and signup_date. Second CTE: first purchase per user with purchase_date. Left join the second onto the first, count signups, count converted, divide. The follow-up: same conversion but bucketed by acquisition channel. Add a GROUP BY channel.

Q5. Find users who placed orders in 3 consecutive months.

Sample answer: Use LAG() and LAG(..., 2) over a partition by user_id, ordered by month. Filter where current month minus lag-1 equals 1 month and lag-1 minus lag-2 equals 1 month. The window-function version is the cleaner pattern; a self-join version also works but reads more verbose.

Q6. Compute the rolling 30-day revenue for each customer.

Sample answer: SUM(revenue) OVER (PARTITION BY customer_id ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW). Know the difference between ROWS and RANGE. ROWS operates on row positions; RANGE operates on values. For daily revenue with no gaps, they behave the same. For sparse data, RANGE with INTERVAL '30 days' PRECEDING is the correct frame.

Q7. Read this slow query and name three speedups.

Sample answer: Look for full table scans on filtered columns (add an index), correlated subqueries (rewrite as joins), expensive sorts (push the sort earlier or add an indexed sort column), unnecessary joins (drop the table if its columns aren't used in the SELECT), and partition pruning misses (filter on the partition key first). The interviewer wants to see you've read execution plans before, not just written SQL.

Q8. Compute the percentage of users who hit feature X within their first session.

Sample answer: One CTE to get each user's first session_id. Another CTE to mark whether feature X was used in that session. Aggregate to get the share. The follow-up: do the same for "within first 24 hours" instead of "first session," which requires a time-window join rather than a session-id match.

Python and pandas interview questions (5 Q)

Python is tested at a data-manipulation level. List comprehensions, the pandas groupby pattern, and the merge family come up in nearly every loop.

Q9. Given a pandas DataFrame of user events, compute the 7-day active users for each day.

Sample answer: Pivot the events to a per-user-per-day boolean. Apply a rolling 7-day window with min_periods=1. Sum across users for each day. Or: use groupby(['user_id', pd.Grouper(key='date', freq='D')]) and chain a rolling window. There are 4-5 ways to do this. Knowing two and being able to talk about the time/memory tradeoff is the signal.

Q10. What's the difference between merge and join in pandas?

Sample answer: merge joins on column values (SQL-style). join joins on the index by default. merge is more flexible: specify left/right/inner/outer, multiple keys, suffix handling. join is convenient when you've already set indexes. In the wild, merge is more common.

Q11. How do you handle missing values in a pandas DataFrame?

Sample answer: Three strategies. Drop rows or columns with missing values (dropna). Fill with a summary statistic (fillna(median)). Impute with a model (KNN, regression). The interview-relevant point is to talk about the tradeoff: dropping loses information, imputation introduces bias if the missingness is correlated with the outcome. Ask the interviewer about the underlying use case before picking.

Q12. Pivot a long-format DataFrame to wide format.

Sample answer: df.pivot(index='date', columns='metric', values='value'). The reverse is df.melt(id_vars=['date'], value_vars=['metric_a', 'metric_b']). Know both. The follow-up: when do you reach for pivot_table instead of pivot? When the index-column combination isn't unique and you need aggregation.

Q13. Vectorize a slow pandas operation.

Sample answer: Replace df.apply(lambda x: x['a'] + x['b'], axis=1) with df['a'] + df['b']. Replace a Python loop with a vectorized NumPy or pandas operation. The 10-100x speedup is typical for any operation that can be expressed as element-wise math. The interview signal: when did you last optimize a slow pandas operation in real code?

Statistics and probability interview questions (8 Q)

The stats bar is undergraduate-course depth plus the ability to apply each concept to a product scenario. Memorizing formulas without understanding them is the canonical failure mode.

Q14. What is a p-value and what does it mean?

Sample answer: A p-value is the probability of observing data at least as extreme as what you saw, under the assumption that the null hypothesis is true. It is NOT the probability that the null is true. It is NOT the probability that the observed effect is real. The interview wants the precise statement. A p-value of 0.03 means: "If there were no real effect, the chance of seeing this data or more extreme would be 3%." That's it.

Q15. What's the difference between Type I and Type II error?

Sample answer: Type I error (alpha): rejecting the null when the null is true. False positive. Type II error (beta): failing to reject the null when the null is false. False negative. Power = 1 - beta = probability of detecting an effect when one exists. The interview-relevant follow-up: if I want to lower alpha (be more conservative), what happens to beta? It goes up. Lowering one raises the other unless you increase the sample size.

Q16. What does a 95% confidence interval mean?

Sample answer: Precise statement: if you repeated the experiment many times and computed a 95% CI each time, 95% of those intervals would contain the true parameter. It is NOT "there's a 95% chance the true parameter is in this specific interval" (that's a Bayesian credible interval, which is a different object). The interview is checking whether you understand the difference between the frequentist and Bayesian interpretations.

Q17. Explain the central limit theorem.

Sample answer: The sampling distribution of the mean of a sample drawn from any population approaches a normal distribution as the sample size increases, regardless of the population's underlying distribution, as long as the population has finite variance. The CLT is why we can use normal-distribution-based hypothesis tests even when the underlying data isn't normal. The interview-relevant nuance: CLT requires finite variance. Heavy-tailed distributions (Cauchy, some power laws) don't satisfy this.

Q18. What is Bayes' theorem and when would you use it?

Sample answer: P(A|B) = P(B|A) * P(A) / P(B). Use when you need to flip a conditional probability. Classic example: a test for a disease is 99% accurate (positive when sick, negative when healthy). Disease prevalence is 1%. If you test positive, what's the probability you're sick? The naive answer is 99%. The Bayes answer is much lower (around 50% with these numbers) because the prior is so small. The interview is checking whether you understand base rates.

Q19. What's the difference between correlation and causation?

Sample answer: Correlation is a statistical relationship between two variables. Causation is that one variable causes a change in the other. Correlation does not imply causation because of confounding variables (a third variable affects both), reverse causation (B causes A, not A causes B), and selection bias (the sample is non-random in a way that creates the apparent relationship). The way to establish causation: randomized controlled experiment. A/B test is the product-data version.

Q20. Explain linear regression and its assumptions.

Sample answer: Linear regression models the relationship between a continuous outcome Y and one or more predictors X as a linear combination. Five assumptions: linearity (the relationship is in fact linear), independence (observations are independent), homoscedasticity (residuals have constant variance), normality of residuals (for inference, not for prediction), and no multicollinearity (predictors aren't highly correlated with each other). Violations don't always break the model but change the interpretation.

Q21. What is R-squared and what does it not tell you?

Sample answer: R-squared is the proportion of variance in the outcome explained by the model. It ranges 0-1 (higher is better fit). What it doesn't tell you: whether the model is causally correct, whether the predictors are statistically significant, whether the model overfits, or whether the relationship is the right functional form. A high R-squared with a bad model is common. Adjusted R-squared penalizes the addition of useless predictors. The interview-relevant point: R-squared alone is not enough.

A/B testing and experimentation interview questions (8 Q)

A/B testing is the single most-asked topic in 2026 DS loops at consumer-tech companies. The depth target is "could design an end-to-end experiment in 5 minutes."

Q22. Walk me through how you'd design an A/B test for a new checkout flow.

Sample answer: Five-part structure. Define the primary metric (conversion rate, AOV, or revenue per user; pick one and defend). State the null and alternative hypothesis. Calculate sample size from baseline conversion, minimum detectable effect, alpha, and beta. Identify guardrails (don't decrease retention, page-load time, or refund rate). Run for one full business cycle (usually one week to handle day-of-week effects). State the stopping rule before starting. No peeking.

Q23. What sample size do you need for an A/B test with baseline 5%, MDE 1%, alpha 0.05, beta 0.2?

Sample answer: Use the proportion-test sample-size formula. Approximate answer: about 15,500 per arm. The exact answer comes from n = (Z_alpha + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p1 - p2)^2. The interview-relevant point is to understand the levers: smaller MDE means much larger sample size (quadratic relationship), smaller alpha or larger power means larger sample, baseline conversion affects sample less but still meaningfully.

Q24. What is the multiple-comparisons problem and how do you handle it?

Sample answer: If you run 20 A/B tests at alpha=0.05, you expect 1 false positive even when no effect is real. The probability of at least one false positive across 20 tests is 1 - 0.95^20 ≈ 64%. Fixes: Bonferroni correction (divide alpha by the number of tests), Holm-Bonferroni (less conservative), or pre-registering which metrics are primary versus secondary (only primary metrics carry the alpha; secondaries are exploratory). Some companies use Benjamini-Hochberg (control false discovery rate, not familywise error rate) for high-throughput experimentation.

Q25. Why is peeking at A/B test data bad?

Sample answer: Peeking inflates the false-positive rate. Each time you look at the data and decide whether to stop, you're effectively running another hypothesis test. The cumulative alpha balloons. If you stop the test the moment p < 0.05, your actual false-positive rate is much higher than 5%. The fix: either commit to a sample size and don't stop until you hit it, or use a sequential testing framework (e.g., always-valid p-values, group sequential designs) that accounts for the peeking.

Q26. What's a novelty effect and how do you account for it?

Sample answer: A novelty effect is the temporary lift (or drop) caused by users reacting to a change because it's new, not because it's better. Common with UI changes. Users explore the new button, lift looks great in week 1, then decays in week 2 once novelty wears off. Account for it by running the experiment long enough to see post-novelty steady state (usually 2 weeks for a UI change), or by analyzing the lift in users' Nth session rather than their first.

Q27. How would you test a feature that requires network effects to work?

Sample answer: Standard user-level A/B testing doesn't work because the treatment effect on one user spills over to the control users they interact with. Use cluster randomization: randomize at the level of the network unit (city, friend group, market) rather than the individual. Or use switchback testing: alternate between treatment and control over time within the same unit. State the tradeoff: cluster randomization reduces statistical power because the effective sample size is the number of clusters, not the number of users.

Q28. What's CUPED and why does it matter?

Sample answer: CUPED (Controlled-experiment Using Pre-Experiment Data) is a variance-reduction technique. It uses each user's pre-experiment value of the outcome as a covariate, removing the user-level variance from the analysis. The result: tighter confidence intervals and smaller required sample size for the same statistical power. CUPED can reduce required sample size by 30-50% for outcomes with strong pre-experiment correlation. The 2026 DS bar at experimentation-heavy companies is awareness of CUPED, not necessarily implementation.

Q29. The A/B test result is statistically significant but the lift is 0.2%. What do you do?

Sample answer: Statistical significance is not the same as practical significance. Ask: what does 0.2% mean in business terms? If it's 0.2% on a multi-billion-dollar conversion, ship it. If it's 0.2% on a metric with high operational cost, hold. State the question to the interviewer: "What's the minimum business-meaningful lift?" The decision is a business call, not a statistics call. The DS role is to surface the tradeoff, not to make the decision unilaterally.

Product sense and case study interview questions (5 Q)

Product sense is the structured-reasoning round. Most case studies don't have a single right answer. The interviewer is grading the diagnostic framework, not the conclusion.

Q30. DAU dropped 5% week-over-week. Diagnose it.

Sample answer: Structured framework. First, confirm it's real (check the data pipeline, check for instrumentation changes, compare against same-week-last-year). Second, isolate the segment (which platform, which geo, which user cohort dropped). Third, isolate the time window (when in the week did the drop happen, was it gradual or step-function). Fourth, propose hypotheses (product change, external event, seasonal, competitor launch). Fifth, prioritize diagnostic experiments by likelihood and cost. State the answer as a triage tree, not a single root cause.

Q31. How would you measure the success of a new feature launch?

Sample answer: Three layers of metrics. Adoption (what fraction of eligible users tried the feature in the first N days). Engagement (frequency, depth, retention of the users who tried it). Business impact (revenue, conversion, retention of the overall user base). Pick one primary metric, two guardrails, and a handful of diagnostic metrics. State that adoption alone is a vanity metric. Without engagement and business impact, a feature can have great adoption and zero value.

Q32. Conversion rate dropped 1% this month. How do you investigate?

Sample answer: Same framework as Q30. Confirm, isolate, hypothesize, prioritize. The conversion-specific lens: split conversion into steps (signup completion, email confirmation, first action). The drop almost always concentrates at one step. Find the step, then ask what changed near that step (product, traffic mix, external event). State that a 1% drop is small in absolute terms but can be a leading indicator if it's part of a multi-week trend.

Q33. The team is debating launching feature X. Should we?

Sample answer: This is the product-strategy variant of the case study. Structure: what's the user need being addressed (pull from the user, not pushed by the team). What's the expected value (estimated impact times probability of working). What's the opportunity cost (what else could the team build with the same effort). What's the risk (could it cannibalize an existing metric). Recommend a small-scale test before full launch. The interviewer is grading whether you have a framework, not whether you've memorized one.

Q34. Design the metrics for a new ride-sharing product.

Sample answer: Four metric layers. Acquisition (signups, app installs by source). Activation (first ride completed within N days of signup). Engagement (rides per active user per week). Retention (Nth-week retention curves). And the supply side: driver supply, driver utilization, surge frequency. Then guardrails: cancellation rate, complaint rate, time-to-pickup. Pick a North Star (probably rides per active user per week or net new active riders). The interviewer is grading metric structure, not the specific names.

Machine learning fundamentals interview questions (5 Q)

The pure-ML bar at entry-level DS is roughly an undergraduate ML course depth plus product application. Most loops include one ML round, sometimes two at ML-product-heavy companies.

Q35. Explain the bias-variance tradeoff.

Sample answer: Total error = bias^2 + variance + irreducible noise. High bias = underfitting (model too simple). High variance = overfitting (model too sensitive to training data). The tradeoff: reducing bias usually increases variance and vice versa. Manage with regularization, cross-validation, ensemble methods, and more training data. The interview-relevant point is that "more complex model = better" is wrong; the right answer is "right-complexity model for the data."

Q36. What's the difference between bagging and boosting?

Sample answer: Both are ensemble methods. Bagging (bootstrap aggregating) trains many models on bootstrapped samples of the training data and averages their predictions. Reduces variance. Random forests are the canonical bagging method. Boosting trains models sequentially, each one focused on the errors of the previous ones. Reduces bias. Gradient boosting (XGBoost, LightGBM, CatBoost) is the canonical boosting method. In practice, boosting usually outperforms bagging on tabular data but is harder to tune.

Q37. How do you evaluate a classification model?

Sample answer: Confusion matrix first (true positives, false positives, true negatives, false negatives). Derived metrics: accuracy (TP+TN / total), precision (TP / TP+FP, "of what I predicted positive, how many were"), recall (TP / TP+FN, "of what was positive in reality, how many did I catch"), F1 (harmonic mean of precision and recall), AUC-ROC (area under the curve, threshold-independent). Pick by use case: precision matters when false positives are expensive (spam filter on important email). Recall matters when false negatives are expensive (fraud detection). Accuracy alone is misleading on imbalanced data.

Q38. How would you handle a class imbalance problem (e.g., 99% no-fraud, 1% fraud)?

Sample answer: Three families. Resampling (oversample the minority class with SMOTE, undersample the majority class, or both). Cost-sensitive learning (assign higher weight to the minority class in the loss function). Choose-the-right-metric (don't optimize accuracy; use recall, F1, or AUC). State that "always resample" is wrong; sometimes the imbalance is informative and the cost-sensitive approach preserves it.

Q39. Explain regularization and when you'd use L1 vs L2.

Sample answer: Regularization adds a penalty term to the loss function to discourage large coefficients. L1 (Lasso) adds the sum of absolute values; tends to drive some coefficients to exactly zero, effectively performing feature selection. L2 (Ridge) adds the sum of squared values; shrinks all coefficients toward zero but rarely to zero. Use L1 when you want a sparse model with clear feature selection. Use L2 when you want a stable model that uses all features. ElasticNet combines both.

Case study interview questions (5 Q)

Case studies are full conversations, not single-answer questions. The interviewer wants to see structured reasoning under live observation. Each case below is the prompt; the answer is the 10-15 minute framework you'd walk through out loud.

Q40. A subscription company wants to reduce churn. Where do you start?

Sample answer (framework): Define churn precisely (paid churn, voluntary vs involuntary, what's the time window). Segment churners (heavy users, light users, by acquisition channel, by tenure). Find the leading indicators (which behaviors predict churn 30 days ahead). Run interventions on the highest-leverage segment (email, in-app, product change). Measure with an A/B test. The interview is graded on whether the framework is structured, not on whether the interventions are right.

Q41. The recommendation system is showing irrelevant items to users. Diagnose.

Sample answer (framework): Confirm with data (CTR drop on recommended items, dwell-time drop, downstream conversion drop). Hypothesize sources (training data drift, model staleness, cold-start users, popularity bias). Diagnostic experiments (compare model output to ground truth for a held-out set, look at the input feature distribution, check whether the issue concentrates in one user segment). Propose fixes (retrain, add diversity constraints, hybrid model with editorial overrides). State that "the model is broken" is not a diagnosis; it's a starting point.

Q42. You're a DS at a fintech. What metrics would you track for a new debit-card product?

Sample answer (framework): Four buckets. Acquisition (apps, approvals, activations). Engagement (active card-holders, transactions per active, payment volume). Risk (chargeback rate, fraud rate, dispute rate). Unit economics (ARPU, revenue per transaction, customer acquisition cost). Pick a North Star (probably active card-holders or transaction volume per active). State guardrails: fraud rate, support ticket volume, app crash rate.

Q43. Sign-ups are flat. What experiments would you propose?

Sample answer (framework): Two layers. Top of funnel (paid ads, organic search, referral). Conversion (landing page, sign-up form, email confirmation). Propose 4-5 experiments across both layers. Prioritize by expected impact times probability of working times cost. State one explicitly cheap, fast experiment (e.g., remove a field from the sign-up form) and one explicitly expensive, slow one (e.g., redesign the landing page). The interviewer is grading whether you can balance breadth (cover the funnel) with depth (specific testable hypotheses).

Q44. Estimate the daily search volume for a new feature.

Sample answer (framework): Bottom-up estimation. Active users × probability the user is in the feature's eligible segment × average sessions per user per day × probability they use the feature in a session. State each multiplier explicitly. Defend each estimate with one sentence of reasoning. State that the answer is order-of-magnitude only, that's the point of the estimation. The interviewer wants structured estimation, not a precise number.

How to prepare for a data scientist interview (7 steps)

A focused 30-day prep plan, scaled for a CS new grad with stats coursework, one Kaggle project, and zero production DS experience. Adjust if your starting point differs. Each step maps to one week (or a half-week for the lighter ones).

  1. Week 1: SQL deep dive. 4 hours per day on window functions, multi-CTE queries, cohort/retention/funnel queries. Use a free practice platform with real datasets. End the week able to write a weekly retention query by signup cohort without hesitation.

  2. Week 2: Statistics and probability refresh. Work through 30 problems across the six clusters: probability fundamentals, hypothesis testing, confidence intervals, CLT, regression, distributions. Read one stats chapter per evening. End the week able to explain p-value, confidence interval, and Type I/II error in plain language.

  3. Week 3: A/B testing and experimentation. Read one short overview of online experimentation. Work through 8 canonical experiment-design problems. Drill the 5-minute end-to-end experiment design answer until it's automatic.

  4. Week 4: Product sense and case studies. Read one product analytics resource daily. Work through 6 metric-drop case studies. Run 3 timed mock case-study rounds. The mock discipline closes the gap between knowing-the-material and saying-it-on-camera.

  5. Throughout: build a portfolio. Two GitHub projects with real READMEs. One Kaggle competition or public-dataset analysis. One A/B test simulation. 12-18 hours of work total. The portfolio is the credibility anchor.

  6. Week 3 onward: practice ML fundamentals as needed. If the target role is heavy on ML, add a daily 60-minute slot on the ML round questions (bias-variance, regularization, classification metrics, imbalance handling). If the role is product-DS, deprioritize this week.

  7. Morning of the interview: warm up with a one-page cheat sheet. Top 15 stats and A/B testing concepts written from memory. The act of writing it was the prep. The sheet is the safety net.

Data scientist interview format by role type

Not every "data scientist" role is the same. The breakdown for the four most common DS sub-roles in 2026:

Role flavorSQL barStats barA/B testingML barProduct senseLeetCodeDomain extras
DS - Product (consumer tech)HighMediumVery highLow-mediumHighRare or skippedFunnel analysis, retention, engagement metrics
DS - Analytics (enterprise)Very highMediumMediumLowMedium-highAlmost neverDashboards, business intelligence, reporting
DS - ML / AppliedMediumHighLow-mediumHighMediumOne round commonModel deployment basics, feature engineering
DS - Research (FAANG, big tech)MediumVery highMediumVery highLowOften one roundResearch methodology, paper-implementation

Two reads from the table:

First, the A/B testing bar is highest at consumer-tech. If you're targeting Meta, DoorDash, Airbnb, Uber, Spotify, or any consumer-tech company, A/B testing is the single most-graded topic. Spend the dedicated week on it. If you're targeting an enterprise analytics role, A/B testing matters less and dashboarding/SQL matters more.

Second, the LeetCode bar at pure-DS roles is low. Below FAANG, algorithm rounds are rare. Don't over-grind LeetCode for a DS pivot. 50-75 problems for pattern recognition is enough; 200+ is wasted prep unless you're targeting FAANG or a research role.

If I'm being honest about where I'd put my 30 days for a CS-new-grad DS pivot targeting consumer-tech: SQL week (cohort queries are graded harder than candidates expect), A/B testing week (this is the round most often blown), case studies (3+ timed mocks is the minimum). Stats refresh in the background. ML round only if the JD calls for it explicitly.

Data Scientist vs Data Analyst vs ML Engineer interviews

The three roles overlap on SQL and Python. Everything else differs meaningfully. The table below maps the surface area so you can calibrate your prep to the role.

AxisData ScientistData AnalystML Engineer
Primary skillStats + A/B testing + product reasoningSQL + business reasoning + dashboardsProduction ML systems + software engineering
SQL barHigh (window functions, cohorts)Very high (analytical SQL, dashboards)Medium
Python barMedium (pandas, NumPy, scikit-learn)Low-medium (pandas, basic scripts)High (production code, ML libraries)
Math / stats barHigh (undergrad stats course)Medium (interpret stats, not derive)High (linear algebra, optimization, gradient descent)
A/B testing barHigh (signature topic)Medium (interpret tests, not design)Low (rarely in MLE loops)
ML barMedium (entry-level depth)Low (rarely tested)High (model architectures, training, serving)
Product senseHigh (case studies are core)Medium-high (business questions matter)Low (focus is on systems, not business)
LeetCode barLow (one round at FAANG; rare elsewhere)NoneHigh (similar to SWE bar)
ToolingJupyter, pandas, SQL, A/B platformsTableau/Looker, SQL, sometimes dbtPyTorch/TensorFlow, MLOps stack, deployment
System-design roundSometimes (ML system design at FAANG DS-Research)RareYes, ML systems oriented
Day-to-dayAnalyze experiments, define metrics, build modelsBuild dashboards, answer business questionsBuild training pipelines, deploy models, monitor in prod
Entry-level salary (US, 2026)$95K-$140K$70K-$105K$110K-$160K

The honest pivot recommendation: data scientist if you took stats and ML and enjoyed them, with strong product thinking; data analyst if you want fast entry at the cost of slower growth and lower ceiling; ML engineer if you genuinely enjoy coding and infrastructure and want the highest entry-level comp. Most CS grads with stats coursework pivot best into DS because the stats + product reasoning combo is harder to fake than pure SQL.

A note on titles: "data scientist" means very different things at different companies. A "Data Scientist" at Meta does product analytics and A/B testing. A "Data Scientist" at a quant trading firm builds models. A "Data Scientist" at a small startup does everything from SQL to ML to dashboards. Read the JD carefully. The interview format will reflect the actual day-to-day.

Common data scientist interview mistakes for CS new grads

After watching enough new grads run DS loops, the same patterns show up across most failures:

1. Treating it like a SWE interview. Grinding LeetCode for the whole prep window, walking in expecting algorithm rounds, freezing when the interviewer pivots to A/B test design. The DS interview is structured around stats, experimentation, and product reasoning. LeetCode is at most 10% of the loop at entry level outside FAANG. Reset the prep allocation.

2. Memorizing stats formulas without understanding them. The interviewer asks "what is a confidence interval" and the candidate recites the math but can't explain what 95% means in plain language. The fix is to drill the interpretation, not just the formula. Practice explaining each concept to an imaginary product manager in under 60 seconds.

3. Skipping the A/B testing week. A/B testing is the signature DS topic at consumer-tech companies. New grads who skip the dedicated week walk into the experiment-design round and stumble on sample-size calculation, the multiple-comparisons trap, or the peeking question. Don't skip the week.

4. Treating case studies as trivia rounds. The case-study round has no single right answer. The interviewer is grading whether you can structure your thinking out loud. Candidates who jump to a conclusion fail; candidates who walk through a diagnostic framework pass even when the conclusion is wrong. Drill the framework, not the answers.

5. Building a portfolio that signals SWE, not DS. A GitHub portfolio with 15 algorithm projects and zero data analysis reads as "SWE candidate trying DS." Replace 10 of those with one Kaggle notebook and one A/B test simulation repo. The DS-signaling portfolio is short and sharp, not long and miscellaneous.

6. Inflating Kaggle results. Don't claim a top-1% Kaggle finish you don't have. Recruiters cross-check, and being caught lying tanks the round. Honest framing works: "I finished in the top 25% on [Competition X], my notebook walked through feature engineering and a gradient-boosted ensemble, here's what I learned and what I'd do differently."

7. Underestimating SQL. SQL is graded harder than candidates expect. The cohort and retention queries are non-trivial and the entry-level bar is "writes them without hesitation." Don't skip the dedicated week. SQL is the single most-tested skill for the entire interview loop.

One thing I'd add from watching new grads do this: don't try to fix all seven the night before. Pick the two that match your weakest area (almost always A/B testing and case studies for CS-grad pivoters) and drill those for the final 48 hours. The other five take care of themselves once those two are solid.

Key terms

The vocabulary surface is wider than a SWE interview because the stats and experimentation surface is wider. Get these right and you pass the credibility check. Get them wrong and the interviewer flags you.

A/B test
A randomized controlled experiment where users are randomly assigned to a control or treatment variant, and a primary metric is compared between arms. The gold standard for establishing causation in product data. Most A/B tests in 2026 use frequentist hypothesis testing with a pre-registered sample size and stopping rule.
p-value
The probability of observing data at least as extreme as what you observed, under the assumption that the null hypothesis is true. NOT the probability that the null is true. The interview will check this distinction.
Confidence interval
A range computed from a sample that, if the experiment were repeated many times, would contain the true parameter some specified percentage of the time (usually 95%). A 95% CI is a frequentist object. The Bayesian analog is a credible interval, which has a different interpretation.
Type I vs Type II error
Type I (alpha): rejecting the null when the null is true. False positive. Type II (beta): failing to reject the null when the null is false. False negative. Power = 1 - beta. Reducing one usually increases the other unless you increase the sample size.
Statistical power
The probability of detecting a real effect of a given size, given the sample size and alpha. Standard target is 80% (beta = 0.2). Under-powered experiments fail to detect real effects; over-powered experiments waste sample.
Minimum detectable effect (MDE)
The smallest effect size your experiment is powered to detect. Sample size scales inversely with MDE squared. Halving the MDE requires roughly four times the sample. Setting MDE is a business question, not a statistics question. What's the smallest lift that would change the decision.
CUPED
Controlled-experiment Using Pre-Experiment Data. A variance-reduction technique that uses each user's pre-experiment outcome value as a covariate to remove user-level variance from the analysis. Can reduce required sample size by 30-50% for outcomes with strong pre-experiment correlation.
Multiple-comparisons problem
Running many hypothesis tests inflates the familywise error rate. At alpha=0.05 across 20 tests, expected false positives ≈ 1. Fixes: Bonferroni correction (divide alpha by number of tests), Holm-Bonferroni (less conservative), Benjamini-Hochberg (control false discovery rate). Pre-registering primary versus secondary metrics is the production-pragmatic fix.
Central Limit Theorem (CLT)
The sampling distribution of the mean approaches a normal distribution as sample size increases, regardless of the population's underlying distribution, provided finite variance. The reason normal-distribution-based tests work even on non-normal data. Breaks down on heavy-tailed distributions without finite variance.
Bayes' theorem
P(A|B) = P(B|A) * P(A) / P(B). The mechanism for updating beliefs given new evidence. Base rates matter. A 99%-accurate test on a 1%-prevalence condition has a much-lower-than-99% positive predictive value, which interviewers love asking about.
Bias-variance tradeoff
Total prediction error decomposes into bias squared (systematic error from too-simple a model), variance (sensitivity to the training sample), and irreducible noise. Reducing bias usually increases variance and vice versa. Regularization, cross-validation, and ensemble methods are the management tools.
Confounding variable
A third variable that affects both the predictor and the outcome, creating an apparent relationship between them that isn't causal. Random assignment in an A/B test breaks confounding because the randomization decouples treatment from any pre-existing variable. Observational analyses always carry confounding risk.
Cohort analysis
Grouping users by a shared attribute (usually signup date or acquisition channel) and tracking their behavior over time within the group. The standard pattern for retention analysis. Reveals product changes that average metrics hide. If overall retention is flat but the most recent cohort's retention dropped, the product has a problem.

The vocabulary above gets you through the credibility check. Knowing what's behind each term (when each concept applies, what the tradeoffs are, what the common failure mode is) gets you through the technical depth check.

Related guides

The data scientist interview is one of several interview formats a CS new grad may face in the 2026 cycle. The following guides close adjacent gaps:

  • Data engineer interview questions: the adjacent pivot. Lower stats bar, higher SQL bar, more tool-specific depth. Many DS candidates also explore DE.
  • Python interview questions: pandas and data-manipulation Python are core to the DS loop. The Python deep dive supports the DS prep.
  • System design basics for new grads: the SWE-flavored framework. Useful for the ML system-design round at FAANG DS-Research roles.
  • Technical phone screen tactics: the phone-screen format is similar across SWE and DS. The content differs: more SQL, less LeetCode.
  • Mock interview practice methodology: the four-mode approach (solo / peer / paid / AI) applies as much to DS prep as to SWE prep. A/B testing rounds in particular benefit from AI mocks because the structured-response shape can be drilled without burning peer time.
  • CS new-grad interview loop: the end-to-end onsite map. DS loops are slightly shorter than SWE loops but the structure (recruiter, phone screen, onsite, behavioral) is similar.

Pick the gap, jump to the matching cornerstone, close the gap, then return to DS prep. That is the loop.


The data scientist interview in 2026 is one of the most CS-grad-friendly entry points into tech, provided the stats coursework is real. Lower LeetCode bar than a SWE loop, broader vocabulary surface, more entry-level openings per applicant in the data career space. The pivot is honest if you study the material. 30 days of focused prep on the SQL + stats + A/B testing + case-study stack closes the gap from CS-new-grad-SWE to interview-ready entry-level data scientist.

InterviewChamp.AI runs realistic data scientist mocks that show up on every interview surface: the SQL whiteboard, the A/B test design conversation, the case-study round, the behavioral. One install, every surface. Start a practice session, narrate as you reason through the case, get scored on what the interviewer is grading, and walk into Monday's phone screen ready.


About the author: Alex Chen is the founder of InterviewChamp.AI, building AI interview prep for the new-grad CS market and writing about the modern interview gauntlet from the inside.

Related guides

Interview Process

System Design Interview Guide for CS New Grads (2026): Framework, Templates, Cheat Sheet

The new-grad system design interview is a vocabulary check, a structure check, and a communication check, not a senior architect evaluation. This guide gives you a 4-step framework, a 12-template cheat sheet, a 45-minute time budget, the five canonical problems that carry 80% of new-grad rotations, and a side-by-side of HLD vs LLD vs machine-learning-system-design. Built for the CS new grad who has solved 600 LeetCode problems but never drawn a load balancer.

Alex Chen ·

Read more →
Interview Process

The 2026 CS New-Grad Interview Loop: Phone Screen to Offer at Every Tier

The 2026 CS new-grad interview loop runs five steps (recruiter screen, technical screen, onsite, debrief, offer) but the shape of each step now depends on tier of company. This guide maps the loop for FAANG, mid-tier public, startup, consultancy, and research lab, with 2026 timelines and how AI-fraud concerns brought in-person rounds back.

Alex Chen ·

Read more →
Interview Process

Accounting Interview Questions for 2026: 40+ Questions for Staff Accountants, Big 4 Candidates, and CPA Pivots

Accounting interview questions in 2026 test six things at once: do you know GAAP cold, can you walk a transaction from journal entry to the three financial statements, can you read a balance sheet under pressure, do you understand the difference between Big 4 audit and corporate close work, can you handle the behavioral round without sounding rehearsed, and can you reason through a case study when the prompt is intentionally vague. If you're an accounting grad, a CPA candidate, or pivoting from finance/ops into staff accountant work, the technical bar isn't the killer. It's framing what you know in 60 seconds while a senior manager watches you on Zoom. This guide walks 40+ questions across six categories, the Big 4 vs corporate vs public-accounting split, and the four-week prep plan that actually works.

Alex Chen ·

Read more →

Frequently asked questions

What do data scientist interview questions test for in 2026?
Data scientist interviews in 2026 test five stacks: SQL at the window-function and CTE level, Python and pandas at a data-manipulation level (not algorithm-grinding), statistics and probability at an undergraduate-stats-course level, A/B testing design and analysis (the single most-asked topic in 2026 loops at consumer-tech companies), and product sense (given a metric drop, how would you diagnose it). Most loops also include a case-study round where you walk through a business problem end-to-end. The pure ML/algorithm round has shrunk at most companies since the 2023-2024 hiring contraction, but it still appears at FAANG and at companies whose products are heavily ML-driven.
How is a data scientist interview different from a data analyst or ML engineer interview?
Data scientist interviews split the difference: more statistics than a data analyst loop, less production-ML than an ML engineer loop, more product reasoning than either. Data analyst interviews focus on SQL plus business reasoning (given a metric drop, what query would you run, how would you slice it). ML engineer interviews focus on production ML systems (model serving, training pipelines, feature stores, distributed training). Data scientist sits in the middle, with A/B testing and experimentation as the unique signature topic. If a role posts as 'Data Scientist - Product' it leans toward analyst territory; 'Data Scientist - ML' leans toward MLE territory.
Should a CS new grad with stats coursework pivot from SWE to data scientist roles in 2026?
If you took stats and probability courses (and understood them, not just passed), did at least one Kaggle competition or research project, and enjoyed the data-driven decision work over pure algorithm grinding, yes. The DS pipeline is meaningfully friendlier than the saturated SWE pipeline at entry level. Class-of-2025 graduates pivoting into DS reported faster time-to-offer than their SWE-search peers in r/datascience and r/cscareerquestions Q1 2026 megathreads. The pivot is honest if you understand A/B testing, can read a confidence interval, and can talk through one or two real projects. It's not honest if you slap 'data scientist' on a SWE-only resume.
What SQL questions show up in data scientist interviews in 2026?
Window functions (RANK, DENSE_RANK, LAG, LEAD, ROW_NUMBER) appear in every loop. Multi-CTE queries, self-joins, cohort analysis queries (retention curves, funnel analysis), and analytical patterns like running totals, rolling averages, and gap-and-island problems. The DS-specific twist on SQL is that the questions are framed as business problems: 'Find the user retention rate by signup cohort,' not 'Write a query using DENSE_RANK.' You're graded on whether the SQL is correct AND whether the metric you computed answers the business question.
What statistics questions should I prepare for in a data scientist interview?
Six clusters: probability fundamentals (conditional probability, Bayes' theorem, expected value), hypothesis testing (null and alternative, p-value interpretation, Type I vs Type II error), confidence intervals (what does a 95% CI mean), the central limit theorem (when it applies and when it doesn't), regression fundamentals (linear regression assumptions, why R-squared isn't enough, multicollinearity), and probability distributions (when to reach for normal vs binomial vs Poisson). The 2026 hiring bar for entry-level DS is roughly an undergraduate stats course plus the ability to apply each concept to a product scenario.
What A/B testing questions do data scientist interviews ask?
A/B testing is the single most-asked topic in 2026 DS loops at consumer-tech companies. Expect: sample-size calculation (given baseline conversion 5%, MDE 1%, alpha 0.05, beta 0.2, how many users per arm?), p-value interpretation, multiple-comparisons problem and the Bonferroni correction, novelty and primacy effects, sequential testing pitfalls (why peeking at the data invalidates your test), variance reduction techniques (CUPED, stratification), and what to do when your test result is statistically significant but practically meaningless. The case-study version: 'Design an experiment to test whether the new checkout flow increases conversion.'
What is product sense and how do you study for it?
Product sense is the ability to look at a metric anomaly, an A/B test result, or a product proposal and reason through it with the same instinct a product manager would use. Study by: reading one product analytics blog per day for two weeks, working through case studies on common product metrics (DAU, retention, conversion, ARPU), drilling the diagnostic framework ('isolate the segment, isolate the time window, isolate the surface'), and practicing the canonical questions out loud. The interviewer grades whether you have a structured diagnostic instinct, not whether you arrive at the 'right' answer. Most case-study rounds don't have a single right answer.
What case study questions show up in data scientist interviews?
Case studies fall into three buckets: metric-drop diagnosis ('DAU dropped 5% this week, diagnose it'), feature-launch design ('design the metrics to evaluate launching this feature'), and experiment design ('design an A/B test for X'). The expected answer shape is structured: state assumptions, propose a diagnostic framework, walk through the framework, name the metric you'd measure, name the threshold for action. Most rounds are 30-45 minutes and expect you to drive the conversation. The interviewer is grading whether you can think structurally about a business problem under live observation.
How do I prepare for a data scientist interview in 30 days as a CS new grad?
Week 1: SQL deep dive, 4 hours per day on window functions, CTEs, cohort queries, and product-flavored questions. Week 2: Statistics and probability refresh, work through 30 problems on hypothesis testing, confidence intervals, Bayes, and the canonical probability questions. Week 3: A/B testing and experimentation. Read one short overview, work through 8 canonical experiment-design problems, drill the sequential-testing and multiple-comparisons traps. Week 4: Product sense and case studies. Read one product analytics resource daily, work through 6 metric-drop cases, run 3 timed mock case-study rounds. Start each week by listing the canonical questions you must be able to answer by Friday; end each week with a timed mock on that week's stack.
What Python and pandas questions appear in data scientist interviews?
Python is tested at a 'data-manipulation' level, not a 'algorithm-grinding' level. Expect: pandas groupby + aggregation, merge and join (left vs inner vs outer, when to use each), pivot and melt (reshape long-to-wide and back), date handling (datetime indexing, time-series resampling), the apply-vs-vectorize tradeoff, NumPy array operations (broadcasting, vectorized math), and the standard data-cleaning patterns (handle missing values, dedupe, type coercion). Most loops include one live coding round where you write pandas against a small dataset. The LeetCode bar for pure-DS roles is typically one easy-to-medium round at FAANG and rare elsewhere.
What's the salary range for entry-level data scientists in 2026?
Entry-level data scientist base salaries in the US run $95K-$140K depending on company tier and location, with mid-market employers concentrated at $100K-$120K. Total comp at large public tech employers can push past $170K with equity and signing bonus. DS pays slightly higher than data engineering and meaningfully higher than data analyst at the entry level. Consumer-tech companies (especially payments and e-commerce) tend to pay more for DS than traditional enterprise. Fintech DS roles often pay the highest at the entry level, partly because the work skews more quant and partly because fintech competes with quant trading firms for the same candidates.
What's the difference between supervised and unsupervised learning?
Supervised learning trains on labeled examples (input + correct output) and predicts the output for new inputs. Examples: classification (spam vs not spam), regression (predict house price). Unsupervised learning works on unlabeled data and finds structure in it. Examples: clustering (group similar users), dimensionality reduction (PCA), anomaly detection. The interview-relevant nuance is that the boundary blurs in practice. Semi-supervised learning uses a small labeled set plus a large unlabeled set, and self-supervised learning generates labels from the data itself (the foundation of how modern frontier reasoning models learn from raw text).
What's the bias-variance tradeoff and why does it matter?
A model's prediction error decomposes into bias (the model's systematic distance from the true relationship), variance (how much the model's predictions change if you train on a different sample), and irreducible noise. A high-bias model underfits. Too simple to capture the relationship. A high-variance model overfits. It fits the training data including its noise, so it generalizes poorly. The tradeoff: reducing bias usually increases variance and vice versa. The fix is regularization, cross-validation, more training data, or ensemble methods. Knowing this framework at the entry level is roughly the difference between 'I studied ML' and 'I studied ML and remember the foundations.'
How do you handle missing data in a data scientist interview?
Three strategies, picked by context. Drop rows or columns with missing values. Fine if missingness is small and random, dangerous if missingness is correlated with the outcome. Impute with a summary statistic (mean, median, mode). Fast and OK for low-stakes models but throws away information. Impute with a model (KNN, regression, MICE). Slower but preserves more signal. The interview-relevant follow-up is whether you ask the right questions first: 'How much is missing? Is the missingness random or correlated with the outcome? What's the cost of a wrong imputation versus dropping the row?' Imputation choice is a tradeoff, not a default.
What's the most common data scientist interview mistake new grads make?
Treating it like a SWE interview. New grads grind LeetCode, walk in expecting algorithm rounds, and freeze when the interviewer pivots to A/B test design or product sense. The DS interview rewards structured business reasoning under live observation. LeetCode is at most 10% of the loop at entry level outside FAANG. Spend the prep time on stats, A/B testing, and case studies instead. The second-most-common mistake: memorizing statistical formulas without understanding them. The interviewer asks 'what's a confidence interval?' and the candidate recites the math but can't explain what 95% means in a sentence a stakeholder would understand.