The Math Behind A/B Testing with Example Python Code

The Math Behind A/B Testing with Example Python Code

Animation by Gal Shir

Outline for A/B Tests

1. Set Up The Experiment

Baseline Conversion Rate and Lift

# code examples presented in Python
bcr = 0.10 # baseline conversion rate
d_hat = 0.02 # difference between the groups

Control Group (A) and Test Group (B)

# A is control; B is test
N_A = 1000
N_B = 1000

2. Run the Test

ab_data = generate_data(N_A, N_B, bcr, d_hat)

ab_summary = ab_data.pivot_table(values='converted', index='group', aggfunc=np.sum)

# add additional columns to the pivot table
ab_summary['total'] = ab_data.pivot_table(values='converted', index='group', aggfunc=lambda x: len(x))
ab_summary['rate'] = ab_data.pivot_table(values='converted', index='group')

3. Compare the Two Groups

fig, ax = plt.subplots(figsize=(12,6))
x = np.linspace(A_converted-49, A_converted+50, 100)
y = scs.binom(A_total, A_cr).pmf(x)
ax.bar(x, y, alpha=0.5)
ax.axvline(x=B_cr * A_total, c='blue', alpha=0.75, linestyle='--')
plt.xlabel('converted')
plt.ylabel('probability')

Binomial Distribution

fig, ax = plt.subplots(figsize=(12,6))
xA = np.linspace(A_converted-49, A_converted+50, 100)
yA = scs.binom(A_total, p_A).pmf(xA)
ax.bar(xA, yA, alpha=0.5)
xB = np.linspace(B_converted-49, B_converted+50, 100)
yB = scs.binom(B_total, p_B).pmf(xB)
ax.bar(xB, yB, alpha=0.5)
plt.xlabel('converted')
plt.ylabel('probability')

Binomial distributions for the control (red) and test (blue) groups

Bernoulli Distribution and the Central Limit Theorem

# standard error of the mean for both groups
SE_A = np.sqrt(p_A * (1-p_A)) / np.sqrt(A_total)
SE_B = np.sqrt(p_B * (1-p_B)) / np.sqrt(B_total)

# plot the null and alternative hypothesis
fig, ax = plt.subplots(figsize=(12,6))

x = np.linspace(0, .2, 1000)

yA = scs.norm(p_A, SE_A).pdf(x)
ax.plot(xA, yA)
ax.axvline(x=p_A, c='red', alpha=0.5, linestyle='--')

yB = scs.norm(p_B, SE_B).pdf(x)
ax.plot(xB, yB)
ax.axvline(x=p_B, c='blue', alpha=0.5, linestyle='--')

plt.xlabel('Converted Proportion')
plt.ylabel('PDF')

Control (red) and test (blue) groups as normal distributions for the proportion of successes

Variance of the Sum

Compare the Null Hypothesis vs. the Alternative Hypothesis

The null hypothesis

The alternative hypothesis

# define the parameters for abplot()
# use the actual values from the experiment for bcr and d_hat
# p_A is the conversion rate of the control group
# p_B is the conversion rate of the test group

n = N_A + N_B
bcr = p_A
d_hat = p_B - p_A
abplot(n, bcr, d_hat)

Null hypothesis (red) vs. alternative hypothesis (blue)

4. Statistical Power and Significance Level

abplot(N_A, N_B, bcr, d_hat, show_power=True)

Statistical power shown in green

abplot(N_A, N_B, bcr, d_hat, show_beta=True)

Beta shown in green

The green shaded area has an area equal to 0.025, which represents alpha.

Significance Level (alpha) and Confidence Level

5. Sample Size

abplot(2000, 2000, bcr, d_hat, show_power=True)

Equation for minimum sample size

Plot for typical significance level of 0.05 or confidence level of 0.95 (z = 1.96)Typical z-score for power level of 0.80 (z = 0.842

min_sample_size(bcr=0.10, mde=0.02)
Out: 3842.026

abplot(3843, 3843, 0.10, 0.02, show_power=True)

Alternate Text Gọi ngay