The Complete Guide to A/B Testing in Python

Have you ever advertised on Facebook and had no idea how to improve your ads and to figure out which images or videos bring you more clients and sell more products?

This article will cover the process of analyzing an A/B Testing, from explaining the basic knowledge to formulating the hypothesis and interpreting the results to estimate which ad asset results in more sales.

The Complete Guide to A/B Testing in Python by Lumei Digital

A/B Testing is widely used in the e-commerce industry. It’s the process of comparing 2 versions of a variable (for example, 2 web pages, 2 digital assets) to determine which one performs better and then leaves more impacts, and drives business metrics.

Today we will use a dataset from Kaggle to create the hypothesis and show you how to find the ad asset that performs better.

Data Exploration

Overall, there are no missing values and some users visited the same design multiple times.

#import libraries
import pandas as pd
import numpy as np
import random
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
random.seed(42)





# Data Exploration
df=pd.read_csv('data.csv')
print(df.shape)
df.head()

The Complete Guide to A/B Testing in Python by Lumei Digital

✓ control group: users who visit the old design page

✓ treatment group: users who visit the new design page

▹converted = 0: users did not buy the product

▹converted = 1: users bought the product

session_counts = df['user_id'].value_counts(ascending=False)
multi_sessions = session_counts[session_counts > 1].count()
print(f'The number of users that appear multiple times in the dataset: {multi_sessions}')

We drop the records that users visit the same design repeatedly.

users_to_drop = session_counts[session_counts > 1].index
df = df[~df['user_id'].isin(users_to_drop)]
print(f'The updated dataset now has {df.shape[0]} entries')
# now we can see the user experience only one page
pd.crosstab(df['group'], df['landing_page'])

The Complete Guide to A/B Testing in Python by Lumei Digital

The Hypotheses:

The hypothesis is that the new design performs better than the old design and leads to a higher conversion rate.

Null hypothesis Hₒ : p = pₒ Two designs have the same impacts

Alternative hypothesis Hₐ : p ≠ pₒ Two designs have the different impacts,

where p and pₒ are the conversion rate of the new and old design respectively, and the confidence level is 95%

If the null hypothesis is true, it means there is no significant difference between your treatment and control group. Basically, the lower p-value the better.