An A/B Test Decision: How to Drive a 108% Surge in Ad Click-Through Rate

slug

example-3

type

Post

status

Published

date

Jun 19, 2025

summary

This project examines a A/B testing case to demonstrate how data analysis provides a scientific basis for marketing strategy decisions. The core objective of the project is to evaluate the effectiveness of two marketing strategies in enhancing ad click-through rates (CTR) against a control group. Through rigorous data cleaning, sample size validation, and statistical hypothesis testing, the optimal strategy (Strategy 2) was identified. This strategy significantly increased the CTR by approximately 1.36 percentage points (a relative increase of about 108.68%), providing clear direction for business growth.

📝 1. Project Background and Objectives

A key challenge in marketing decision-making is scientifically evaluating the effectiveness of new strategies. Based on a classical business scenario (with data sourced from Alibaba Cloud Tianchi), this project simulates the full process from data preparation to statistical inference. Its goal is to solve the practical problem of‘scientifically selecting the optimal solution from multiple marketing strategies.’

The complete code of this project has been published on GitHub: https://github.com/IvyXiaZhou/AB-test/blob/main/AB%20Test.py

2. Data Understanding and Preparation

2.1 Data Cleaning

This dataset is from Alibaba Cloud Tianchi and contains three table files. This project only uses a file named‘effect_tb.csv’, which records the click situations of Alipay users on advertisements.

To ensure data quality during the initial data exploration, the following steps were implemented:

2.1.1 Column Renaming and Filtering

Pandas was used to assign column names ('dt', 'user_id', 'label', 'dmp_id') to the unnamed data. The 'dt' (date) column was removed because this project focuses on user behavior itself, which is irrelevant to time series. Removing this column simplified the dataset.

After processing, the table included the following fields:

user_id: Unique user identifier

label: Indicator for ad clicks (0 = not clicked, 1 = clicked)

dmp_id: Marketing strategy identifier (1 = control group, 2 = Strategy 1, 3 = Strategy 2)

2.1.2 Duplicate Value Handling

Records were sorted by‘user_id’and duplicate entries were removed to ensure the uniqueness of user behavior data.

2.1.3 Missing and Outlier Handling

No missing values were found in the data. A grouped pivot table was created to verify that the values of ‘label’ (0 or 1) and‘dmp_id’ (1, 2, 3) fell within the expected ranges, with no outliers detected.

2.2 Sample Size Sufficiency Validation

Before launching the A/B test, it was necessary to ensure the sample size was sufficient to detect meaningful effects. The industry-standard Evan Miller Sample Size Calculator was used, which only requires input of the Baseline Conversion Rate and Minimum Detectable Effect to generate the required sample size.

Firstly, the average CTR of the control group was calculated to be approximately 1.26%, which was used as the Baseline Conversion Rate. Based on business consensus, a CTR increase of at least 1% was considered to have practical promotion value, so the Minimum Detectable Effect was set to 1%. The calculation results showed that each group required a minimum of 2,167 samples.

In the actual dataset, the sample sizes of the control group, Strategy 1 group, and Strategy 2 group all far exceeded this number (exceeding 300,000 each). This fully met the test’s requirements for statistical power (typically set at 80% or higher) and ensured the credibility of the results.

At this stage, the dataset was cleaned into a usable format with 3 features and 2,632,975 samples, and was saved to the file‘data/output.csv’.

3. Exploratory Data Analysis

The primary objective of this phase was to gain a deeper understanding of the data and use hypothesis testing to compare whether there were significant differences in CTR among different marketing strategies (control group vs. Strategy 1 vs. Strategy 2).

3.1 Data Loading and Basic Statistics

The file‘data/output.csv’was read, and the data was divided into 3 groups by‘dmp_id’. The average CTR (proportion of‘label’ = 1) for each group was calculated as follows:

Control group: 0.012551012429794775

Strategy 1: 0.015314747742072015

Strategy 2: 0.026191869198779274

The data showed that both Strategy 1 and Strategy 2 had higher CTRs than the control group, with increases of 0.28 percentage points and 1.36 percentage points, respectively. Strategy 2 demonstrated a more prominent improvement, with a relative increase of approximately 108.68% compared to the control group.

3.2 Statistical Hypothesis Testing

To verify the effectiveness of Strategy 2, the CTR of the control group was defined as‘p₁’ and that of Strategy 2 as‘p₂’. The following statistical hypotheses were established:

Null Hypothesis (H₀): p₂ ≤ p₁ (the CTR of Strategy 2 is less than or equal to that of the control group)

Alternative Hypothesis (H₁): p₂> p₁ (the CTR of Strategy 2 is higher than that of the control group)

Given the large sample size and the binomial distribution of the data, a Z-test was adopted with a significance level (α) of 0.05.

3.2.1 Method 1: Formula Calculation

The sample size, number of clicks, and CTR were calculated for both the control group and Strategy 2 group, along with the combined CTR (‘r’). The key statistics are as follows:

Control group: Sample size = 1,905,663; Number of clicks = 23,918; CTR = 1.26%

Strategy 2 group: Sample size = 316,205; Number of clicks = 8,282; CTR = 2.62%

Combined CTR (r) = 1.45%

The Z-test formula is:

Where:

p₁, p₂ = CTR of the control group and Strategy 2 group

p = Combined CTR

n₁, n₂ = Sample sizes of the two groups

Substituting the known values into the formula yielded a test statistic Z of 59.44168632985996.

This was a one-tailed (right-tailed) test. The critical Z-value corresponding to a significance level α = 0.05 (rejection region: Z > z_alpha) is approximately 1.645. Since the calculated Z-statistic (59.44) was much larger than the critical Z-value (1.645) for α = 0.05, the null hypothesis was rejected. This indicates that the CTR of Strategy 2 is significantly better than that of the control group.

3.2.2 Method 2: StatsModels Library

In addition to the above method, Python’s StatsModels library was used to calculate the Z-value and p-value. This method is easy to implement and serves as cross-validation for Method 1.

The‘proportions_ztest’ function was used to directly perform a Z-test on the CTRs of the control group and Strategy 2 group. The number of clicks and sample sizes of the two groups were input into the code and alternative='larger' was set to adopt a right-tailed test (assuming Strategy 2 has a higher CTR than the control group). The function output the z_score (Z-statistic) and p (p-value). As a supplement, the same method was used to compare the CTR difference between the control group and Strategy 1 group. The results of the two calculations are as follows:

Strategy 2 vs. Control group: Z = 59.44, p ≈ 0.0

Strategy 1 vs. Control group: Z = 14.17, p = 7.45e-46

Interpretation of the p-value: When the significance level α is set to 0.05, if p < 0.05, the null hypothesis is rejected; if p > 0.05, the null hypothesis cannot be rejected.

For both strategies, the p-values were approximately 0, which are less than 0.05. Thus, the null hypothesis was rejected, indicating that both strategies had significantly higher CTRs than the control group. However, based on the earlier CTR data, Strategy 2 had a larger improvement (1.36 percentage points) than Strategy 1 (0.28 percentage points). From a business effectiveness perspective, Strategy 2 is the optimal choice.

With large sample sizes (exceeding 300,000 for each group), even small CTR differences—such as the 0.28% increase from Strategy 1—can be statistically significant. Therefore, business decisions should consider not only statistical significance but also the magnitude of the actual improvement.

4. Business Recommendations

The results from both methods are consistent, providing sufficient evidence that Marketing Strategy 2 is more effective in increasing ad CTR. Based on this, the following recommendations are proposed:

Prioritize the rapid deployment and promotion of Strategy 2 across key channels to maximize its marketing value.

Conduct an in-depth analysis of the specific components of Strategy 2 to extract its success factors, providing guidance for the design of future marketing campaigns.

While promoting Strategy 2, design new variants based on its success factors (e.g., creativity, delivery logic) and launch a new round of A/B testing to pursue continuous optimization.

5. Reflections and Learning

Through this project, the author not only systematically practiced the complete technical process of A/B testing but also gained a deeper understanding of the core value of statistical thinking in driving business decisions. Statistical methods can convert subjective opinions into objective data evidence, thereby mitigating decision biases. In the future, I aim to apply this methodology to more complex scenarios, such as multivariate testing or sequential analysis, to create greater value for the business.

📎 References

https://www.heywhale.com/mw/project/5efee4a563975d002c98adba

https://www.evanmiller.org/ab-testing/sample-size.html