A Simple AB Testing Data Analysis Exercise
In this post, I will present a brief report of an analysis that was made on a data set for A/B testing. The data cleaning and visualisation for this exercise was completed using Python and the statistical analysis was conducted using R (link to the source code is provided at the bottom of the post).
The purpose of this post is to walk the reader through the project or exercise. This will involve a simple summary of the project, an exposition of the AB test data that was used, the hypothesis that was investigated, the methods used to clean and visualise the data and finally the logistic regression (logit) model used to make the statistical inference to see if the null hypothesis could be rejected.
Executive Summary
Two web pages (simply called ‘new_page’ and ‘old_page’) were distributed to users to investigate if the landing page that the users visited impacted their conversion rate. The users of the old_page were selected to be in the control group, while its variant, the new_page, was provided to users in a contrast group. The number of users that were converted after visiting each page was recorded. A statistical model was used to compare the differences in conversion rate between the users of the two pages. The results showed that there was no evidence that the new_page yielded a higher conversion rate than the old_page.
The Data
The initial uncleaned data that was used can be downloaded here. This data was obtained for exercise purposes from Kaggle. The data was structured as shown in the table below. Lets have a look at the first few rows of the data.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("AB_data/ab_data.csv")print(df.head())user_id,timestamp,group,landing_page,converted
851104,11:48.6,control,old_page,0
804228,01:45.2,control,old_page,0
661590,55:06.2,treatment,new_page,0
853541,28:03.1,treatment,new_page,0
864975,52:26.2,control,old_page,1
936923,20:49.1,control,old_page,0
679687,26:46.9,treatment,new_page,1
Each row represents a user, the timestamp of when they accessed the web-page, their test group (control vs treatment), the page that they landed on (‘new_page’ vs ‘old_page’) and whether or not they were converted (encoded as a binary value of 0 for not converted and 1 for converted).
Hypothesis
This project investigated if the landing page that a user visits has any effect on their conversion rate. More specifically, we wanted to see if the null hypothesis that there is no difference in effectiveness of the new_page in converting users compared to the old_page could be rejected.
Method
The Data Cleaning and Visualisation codes were written in Python.
Data Cleaning
There were 294,478 rows of data:
print(np.shape(df))(294478, 5)
However, this was not the number of users that accessed the web-pages. There was an excess of around 400,000 users that visited the web-pages more than once (possibly visited both web-pages).
print(df["user_id"].nunique())290584
Having some users access both pages would make the results of the analysis invalid. Therefore, the users in the control group were paired only with the ‘old’ landing page and those in the treatment group were paired only with the ‘new’ landing page.
df = df[((df['group'] == 'control') & (df['landing_page'] =='old_page')) | ((df['group'] == 'treatment') & (df['landing_page'] == 'new_page'))]print(np.shape(df)(290585, 5)
There were 290,585 such users (which is one more than the number of unique users, as noted above). A user had landed on both the new and old pages. Therefore, the second entry for this user was removed.
df = df.drop_duplicates('user_id','first')
Dropping the second record for the duplicated user ensured that the number of users/rows was equal to the number of unique users.
print(df['user_id'].nunique() == np.size(df['user_id']))True
The data were now cleaned and ready for visualisation and analysis.
Data Visualisation
First, the number of users in the control group and treatment groups were compared. A bar graph was used to visualise the number of users that belonged to each group.
control_group = df[df['group'] == 'control']
treatment_group = df[df['group'] == 'treatment']n_control = np.size(control_group)
n_treatment = np.size(treatment_group)
n_control_vs_treatment = np.array([n_control, n_treatment])
fig, ax = plt.subplots()
con1, tre1 = plt.bar(['Control','Treatment'],n_control_vs_treatment,0.35)
con1.set_facecolor('y')
tre1.set_facecolor('b')
ax.set_ylabel('Number of Users')
ax.set_title('Number of users in the control group vs treatment group')
plt.show()
The number of users were similar. Therefore, the differences in group sizes was unlikely to bias the results of the statistical analysis.
A bar chart to show the proportion of users that were converted by either web-pages against those that were not converted was also plotted to investigate the overall effectiveness of either pages in converting users.
n_control_converted = np.size(df[((df['group'] == 'control')(df['converted'] == 1))])
n_control_not_converted = np.size(df[((df['group'] == 'control') & (df['converted'] == 0))])
n_treatment_converted = np.size(df[((df['group'] == 'treatment') & (df['converted'] == 1))])
n_treatment_not_converted = np.size(df[((df['group'] == 'treatment') & (df['converted'] == 0))])
total_converted = n_control_converted + n_treatment_converted
total_not_converted = n_treatment_not_converted + n_control_not_converted
proportion_converted = total_converted/(total_not_converted + total_converted)
proportion_not_converted = total_not_converted/(total_converted + total_not_converted)
total_converted_vs_not_converted = np.array([total_converted, total_not_converted])
converted_vs_not_converted = np.array([100*proportion_converted, 100*proportion_not_converted])
fig1, ax1 = plt.subplots()
con, not_con = plt.bar(['Converted','Not Converted'], total_converted_vs_not_converted, 0.35)
con.set_facecolor('g')
not_con.set_facecolor('r')
ax1.set_title("Total number of users converted and not converted")
ax1.set_ylabel("Number of users")
plt.show()
The total number of users that were converted was considerably lower than the number of users that were not converted. Therefore, both pages were not highly effective at converting users.
The mean for the number of users that were converted for the control group was ~ 0.1204 and for the treatment group was ~ 0.1188. This showed that the conversion rate for both groups was low and not very different:
control_mean = control_group['converted'].mean()
treatment_mean = treatment_group['converted'].mean()print(control_mean)
print(treatment_mean)0.1203863045004612
0.11880806551510564
The proportion of users that were converted in the control condition vs the treatment condition was also plotted for comparison.
proportion_control_converted = n_control_converted/(n_control_converted + n_treatment_converted)proportion_treatment_converted = n_treatment_converted/(n_treatment_converted + n_control_converted)
proportion_converted_treatment_vs_control = np.array([100*proportion_control_converted, 100*proportion_treatment_converted])
fig2, ax2 = plt.subplots()
opge, npge = plt.bar(['Old Page', 'New Page'], proportion_converted_treatment_vs_control, 0.35)
npge.set_facecolor('b')
opge.set_facecolor('y')
ax2.set_title("The proportion of users converted to the old page vs. new page")
ax2.set_ylabel('Percentage of Users')
plt.show()
This demonstrated that the proportion of users converted by the old_page was highly similar to the new_page. However, to ensure that this difference was indeed not significant, statistical modelling was used.
Logistic Regression Model
A logistic regression model (since the dependent variable, ‘converted’, was categorical/dichotomous) was used for statistical analysis to investigate if the conversion rate of the control group was significantly different to the treatment group. This analysis was conducted using R.
> library(aod)
> library(ggplot2)> ab_data = read.csv("cleaned_ab_data.csv")
## We now want to use the glm() function to estimate a logistic regression model. But first, we will convert the 'group' column/variable into a categorical variable.> ab_data$group = factor(ab_data$group)## We can now train the logistic regression model and see the results.> logitmodel = glm(converted ~ group, data = ab_data, family = "binomial")> summary(logitmodel)Call:
glm(formula = converted ~ group, family = "binomial", data = ab_data)Deviance Residuals:
Min 1Q Median 3Q Max
-0.5065 -0.5065 -0.5030 -0.5030 2.0641Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.988777 0.008062 -246.671 <2e-16 ***
grouptreatment -0.014989 0.011434 -1.311 0.19
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)Null deviance: 212778 on 290583 degrees of freedom
Residual deviance: 212776 on 290582 degrees of freedom
AIC: 212780Number of Fisher Scoring iterations: 4
The ‘group’ variable was dummy coded. The odds of converting after landing on the new_page was also compared to the old_page. The coefficient of the group treatment as shown above was -0.014989. This log-odds value was transformed to the odds value, which was exp(-0.014989) = 0.985. The likelihood of someone converting if they were in the control group was computed using 1/0.985 = 1.015. That is, a user in the control group using the old_page is 1.015 times more likely to convert than a user in the treatment group using the new_page.
However, the p-value for this test was 0.19. With a threshold of 0.05, we therefore could not conclude that the two groups were significantly different. Hence, the null hypothesis that the conversion rate does not depend on the landing page a user uses could not be rejected.
Conclusion
Two variants of web pages were distributed to users to see if they differed in their effectiveness for converting users. The analysis showed that neither pages were more effective. Given that the number of users that the data was tested on was large (<200,000), we concluded that there is no evidence that the newer variant of the web page was more effective in converting users than the older one.
The source code for the python code used for data cleaning and visualisation can be found here. The R code used for fitting the logistic regression model can be found here.