Calculator

Input Parameters

The terms trials and successes should be considered as placeholder terms for the interactions you wish to measure. Depending on your specific requirements, a trial could be a reference to a unique user count, or a unique page-view event.

Trials: All interactions (successful+unsuccesful)
Successes: All successful interactions
Control: Denotes the unchanged, original version of your site
Test: Denotes the modified version of your site. Also referred to as the treatment group

Example

You are running an e-commerce site that has a product list and a product detail page. You would like to test if a re-design of your product list leads to more users advancing to the product detail page.

Views: All unique users that visited the product list, irrespective of whether a user has visited the product detail page
Hits: All unique users that visited the product list AND who subsequently visited the product detail page (and only those)

Input Parameters - Expert Settings

Info

For a fuller explanation of these terms check out the Glossary

ROPE

Region of Practical Equivalence, in percent (%), expressed as an absolute value integer

Example: If you consider all changes between -2% and -2% as being identical. set ROPE = 1

Prior

Prior Alpha: Prior value for successes
Prior Beta: Prior value for trials

Example: Historically, the click-through-rate of your product list has been around 20%. Given the success-rate, you could choose a weaker (alpha = 1; beta = 5) or a stronger prior (alpha = 1000, beta = 5000)

Plot

Interpretation

The Expected Uplift Plot provide a convenient and intuitive way to evaluate experiments. While the x-Axis denotes the Minimum Relative Uplift that version Test provides over the Control version, the y-Axis assigns a probability (chance) to each point along the x-Axis. Taken together, each point along the chart can thus be read as:

Given the data so far, there is a y% probability of version Test being at least x% better than version Control

When analyzing a split, special attention should be paid to the vertical line originating at x = 0. Since all values to the right of this "zero-uplift" threshold can be considered an improvement, the probability given by the y-value at the intersection between this line the probability-curve, offers a quick way of evaluating how well the Test version is performing.

Examples

Test with higher success rate

The cumulative probability curve crosses the zero-uplift threshold at around 90%. Thus, there is a 90% chance, that version test is at least as good as the control version! At 80% probability, the new version provides an uplift of at least 15%. Positive Test

Test and Control with equal success rates

The cumulative probability curve crosses the zero-uplift threshold at exaclty 50%. Thus, the probability of one version being better than the other is just the same as getting heads when flipping a coin. Neutral Test

Test with lower success rate

The cumulative probability curve crosses the zero-uplift threshold at around 10%. Thus, there is only a 10% chance, that version test is at least as good as the control version! Negative Test

Recommendation

Troubleshooting

Running the same numbers multiple times yields different results - what's wrong?

Nothing. The probability estimates are derived using random sampling methods. Running the calculation again thus leads to different samples being drawn. Note however, that running calculations multiple times using the exact same inputs should only cause results to differ marginally.

How do I know when to stop my experiment?

The calculator comes with a The stopping-criterion is determined by two factors: The HDI and the ROPE. The way these two measures directly translates into three different recommendations:

The HDI is completely contained within the ROPE -> End the experiment, implement either variant
HDI and ROPE are partially overlapping -> Keep testing
HDI and ROPE are not overlapping -> End the experiment, implement winning variant

Input Parameters​

Example

Input Parameters - Expert Settings​

Info

ROPE​

Prior​

Plot​

Interpretation​

Examples​

Test with higher success rate​

Test and Control with equal success rates​

Test with lower success rate​

Recommendation​

Troubleshooting​

Running the same numbers multiple times yields different results - what's wrong?​

How do I know when to stop my experiment?​