45 Statistics & A/B Testing Interview Questions (Updated for 2022)

AB testing visualization

In data science interviews, easy A/B testing interview questions ask for definitions, gauge your understanding of experiment design, and determine how you approach setting up an A/B test. Some of the most common basic A/B testing interview questions include:

1. What types of questions would you ask when designing an A/B test?

This type of question assesses your foundational knowledge of A/B test design. You don’t want to begin without first understanding the problem the test aims to solve. Some questions you might ask include:

  • What does the sample population look like?
  • Have we taken steps to ensure our control and test groups are truly randomized?
  • Is this a multivariate A/B test? If so, how does that affect the significance of our results?

The most important consideration is that running multiple t-tests exponentially increases the probability of false positives (also called Type I errors).

“Exponentially” here is not a placeholder for “a lot.” If each test has false-positive probability ψ, the probability of never getting a false positive in n many tests is (1 − ψ)n, which clearly tends to zero as n→∞.

There are two main approaches to consider in this situation:

  1. Use a correction method such as the Bonferroni correction
  2. Use an F-test instead

3. How would you approach designing an A/B test? What factors would you consider?

In general, you should start with understanding what you want to measure. From there, you can begin to design and implement a test. There are four key factors to consider:

  • Setting metrics: A good metric is simple, directly related to the goal at hand, and quantifiable. Every experiment should have one key metric that determines whether the experiment was a success or not.
  • Constructing thresholds: Determine by what degree your key metric must change in order for the experiment to be considered successful.
  • Sample size and experiment length: How large of a group are you going to test on and for how long?
  • Randomization and assignment: Who gets which version of the test and when? You need at least one control group and one variant group. As the number of variants increases, the number of groups needed increases too.

It’s helpful to first explain the difference between a user-tied test and a user-untied test. A user-tied test is a statistical test in which the experiment buckets users into two groups on the user level. Therefore, a user-untied test is one in which they are not bucketed on the user level.

For example, in a user-untied test on a search engine, traffic is split at the search level instead of the user level given that a search engine generally does not need you to sign in to use the product. However, the search engine still needs to A/B test different algorithms to measure which ones are better.

One potential con of a user-tied test is that bias can be a problem in user-untied experiments because users aren’t bucketed and can potentially see both treatments. What are other pros and cons you can think of?

5. What p-value should you target in an A/B test?

Typically, the significance level of an experiment is 0.05 and the power is 0.8, but these values may shift depending on how much change needs to be detected to implement the design change. The amount of change needed can be related to external factors such as the time needed to implement the change once the decision has been made.

A p-value of <0.05 strongly indicates that your hypothesis is correct and the results aren’t random.

Hint: Is the interviewer leaving out important details? Are there more assumptions you can make about the context of how the A/B test was set up and measured that will lead you to discovering invalidity?

Looking at the actual measurement of the p-value, you already know that the industry standard is .05, which means that 19 out of 20 times that you perform that test, you’re going to be correct that there is a difference between the populations. However, you have to note a couple of considerations about the test in the measurement process:

  • The sample size of the test
  • How long it took before the product manager measured the p-value
  • How the product manager measured the p-value and whether they did so by continually monitoring the test

What would you do next to assess the test’s validity?

7. What are some common reasons A/B tests fail?

There are numerous scenarios in which bucket testing won’t reach statistical significance or the results end up unclear. Here are some reasons you might avoid A/B testing:

  • Not enough data:A statistically significant sample size is key for an effective A/B test. If a landing page isn’t receiving enough traffic, you likely won’t have a large enough sample size for an effective test.
  • Your metrics aren’t clearly defined: An A/B test is only as effective as its metrics. If you haven’t clearly defined what you’re measuring or your hypothesis can’t be quantified, your A/B test results will be unclear.
  • Testing too many variables: Trying to test too many variables in a single test can lead to unclear results.

When you’re testing more than one variant, the probability that you reached significance on a variant by chance is high. You can understand this by calculating the probability of one significant result by taking the inverse of the p-value that you are measuring.

Therefore, if you want to know the probability that you are getting a significant result by chance, you can take the inverse of that. For example: P(one significant result) = 1 − P(number of significant results) = 1 − (1 − 0.05) = 0.05

There is a 5% probability of getting a significant result just by chance alone. This makes intuitive sense given how significance works. Now, what happens when you test 20 results and are getting one variant back that is significant? What’s the likelihood it occurred by chance?

P(one significant result) = 1 − (1 − 0.05)^20 = 0.64 That is now a 64% chance that you got an incorrect significant result. This result is otherwise known as a false positive.

9. How long should an A/B test run?

Experiment length is a function of sample size since you’ll need enough time to run the experiment on X users per day until you reach your total sample size. However, time introduces variance into an A/B test; there may be factors present one week that aren’t present in another, like holidays or weekdays vs. weekends.

The rule of thumb is to run the experiment for about two weeks, provided you can reach your required sample size in that time. Most split tests run for 2-8 weeks. Ultimately, the length of the test depends on many factors, such as traffic volume and the variables that are being tested.

10. What are some alternatives to A/B testing? When is an alternative the better choice?

If you’re looking for an alternative to A/B testing, there are two common tests that are used to make UI design decisions:

  • A/B/N tests: This type of test compares several different versions at once (the N stands for “number,” e.g., the number of variations being tested) and is best for testing major UI design choices.
  • Multivariate: This type of test compares multiple variables at once, e.g., all the possible combinations that can be used. Multivariate testing saves time, as you won’t have to run numerous A/B tests. This type of test is best when considering several UI design changes.

It is important to ensure there is a normal distribution of users, with a variety of attributes, to guarantee the results of the A/B test are valid; randomizing insufficiently may result in confounding variables further down the line.

It also matters when A/B tests are given to users. For instance, is every new user given an A/B test? How will that affect assessment of existing users? Conversely, if A/B tests are assigned to all users, and some of those users signed up for the website this week, and others have been around for much longer, is the ratio of new users to existing users representative of the larger population of the site?

Finally, it is also important to ensure that the control and variant groups are of equal size so that they can be easily (and accurately) compared at the end of the test.

12. What metrics might you consider in an A/B test?

In general, there are many different metrics you might consider in an A/B test. But some of the most common are:

  • Impression count
  • Conversion rate
  • Click-through rate (CTR)
  • Button hover time
  • Time spent on page
  • Bounce rate

The variable you should use is based on your hypothesis and what you’re testing. If you’re testing a button variation, button hover time or CTR are probably the best choices. But if you’re testing messaging choices on a long-form landing page, time spent on page and bounce rate would likely be the best metrics to consider.

13. What are some of the common types of choices you can test with A/B testing?

In general, A/B testing works best at informing UI design changes, as well as with promotional and messaging choices. You might consider an A/B test for:

  • UI design decisions
  • Testing promotions, coupons, or incentives
  • Testing messaging variations (e.g., different headlines or calls-to-action)
  • Funnel optimizations

Alternate Text Gọi ngay