Hassan Ijaz

Ai, Web & Design
← Back to all topics
Statistical InferenceTopic 21 of 58

A/B testing simulator

Website optimization game where users run tests, see results accumulate, and learn about statistical significance

Concept Overview

A/B testing is a controlled experiment comparing two versions to determine which performs better. It's widely used in product development, marketing, and user experience optimization.

Experimental Design

Random Assignment

  • Users randomly assigned to control (A) or treatment (B)
  • Eliminates selection bias and confounding
  • Ensures groups are comparable on average
  • Foundation for causal inference

Key Metrics

  • Primary metric: Main outcome of interest
  • Secondary metrics: Additional insights
  • Guardrail metrics: Ensure no negative side effects
  • Choose metrics before running experiment

Statistical Analysis

Two-Sample Tests

  • Proportions: Z-test or χ² test
  • Means: Two-sample t-test
  • Non-parametric: Mann-Whitney U test

Effect Size

  • Absolute difference: |p_B - p_A|
  • Relative lift: (p_B - p_A) / p_A
  • Practical significance vs statistical significance

Sample Size & Duration

Power analysis determines required sample size:

  • Minimum detectable effect (MDE)
  • Statistical power (typically 80%)
  • Significance level (typically 5%)
  • Baseline conversion rate

Run until reaching planned sample size, not until significance!

Common Pitfalls

Peeking Problem

Stopping early when results look significant inflates Type I error

Multiple Comparisons

Testing many metrics without correction increases false positives

Novelty Effects

Users may behave differently initially, effects may not persist

Simpson's Paradox

Aggregate results may reverse when segmented by subgroups

Advanced Techniques

Sequential Testing

Continuous monitoring with error control

Bayesian A/B Testing

Probability statements about treatment effects

Multi-Armed Bandits

Adaptive allocation based on interim results

Stratification

Control for known covariates to reduce variance

Best Practice: Focus on business impact, not just statistical significance. A statistically significant 0.1% improvement might not justify implementation costs.

The website optimization game below lets you run A/B tests and see results accumulate. Learn about statistical significance while optimizing conversion rates!

Interactive Visualization

Loading interactive visualization...

Website optimization game where users run tests, see results accumulate, and learn about statistical significance