How Your AB Test’s Statistical Significance Will Increase Your Amazon Success

andrew browne Uncategorized Leave a Comment

Sharing is caring!

Our software, Splitly, helps you optimize your listing by testing different variants, or changes, in your listing, and seeing which one causes positive or negative effects in your sessions, conversions, and profits.

 

One problem we face at Splitly is that there can be many causes to a dip or spike in sales. So how can we be sure that the variant you tested is the cause of these changes in your sales data?

 

In order to give you an accurate opinion on whether the variants you are testing caused the changes, we

  1. test each variant at least once a day to mitigate any daily fluctuations
  2. look at your previous sales data to account for any anomalies in sales
  3. use statistical significance to help determine how confident we are that your variant caused the changes

 

By taking these steps, we are able to give you an accurate reflection of what is actually causing changes in your sales statistics.

 

How confident we are that your variant caused the change is reflected in the statistical significance percent given to you after every test.

Splitly Tests Changes You Make to Your Listings to See Which One Affects Your Sales Statistics

 

Splitly helps you optimize your Amazon listings by letting you split test different changes, or what we call “variants”, in your listing. We do this by updating your listing with the changes on your variants each day.

 

The simplest test is an “A/B” test. Here you just test 2 variants, e.g. on the ‘Original’ tab, you could have your sale price at $15, and on ‘Variant 1’ you could have it at $20.

 

It’s worth noting that you are free to edit anything on the Original tab, as it is treated just like the other variants.

 

As there are only 2 variants in an A/B test, you will be able to find a winner faster than if you add more variants. You may want to add more variants though, if you are testing a combination of things, and there is no limit as to how many you can add.

 

We run each variant for at least 7 days to make sure we test every variant on every day of the week. The more variants you have, the more days the test will run for, and the more time it will take to give you an accurate test, because, again, we run each variant every day of the week.

 

When you click the ‘update test results’ button , we gather your data for each variant and average it out over the number of days it ran.

 

We show you your sessions, conversions, and your profit, and whether these increased, decreased or remained the same. We also show you by how much these changed.

 

You may be asking though, how can we really know what caused these changes? Is it the variant or some other factor? How accurate can this really be?

Real World Events and Random Fluctuation Makes it Difficult to Determine What Caused the Changes in Your Sales Statistics

 

It’s really quite hard to determine what is causing an increase or decrease in your sales statistics.

 

Think about it for a second. You are likely a veteran seller on Amazon, and you have seen how your sales can often fluctuate for various reasons outside of your control.

 

Your sessions and sales may increase or decrease due to

  • different daily sales averages
  • holidays
  • seasons
  • tax refunds
  • your competitor running out of stock
  • Amazon increasing your ranking
  • random fluctuations in sales

 

So how can we really be sure that the variants you are testing are really the causes of the changes in your sales statistics?

 

Are these changes due to the variants you are testing, or are these changes due to the world events described above?

 

That’s where statistical significance comes in.

Statistical Significance Helps Us Determine What Caused Changes in Your Sales Statistics

 

Statistical significance is basically how confident we can be that the variant you are testing, and not any of the real world factors mentioned above, is the cause of the changes in your sales statistics.

 

Statistical significance is usually expressed in a percent, with the higher the percent meaning the more confident we are that the variants caused the changes.

 

So 95% statistical significance means that we can be 95% sure that the variant you are testing is the cause of the change, A 90% statistical significance means we can be 90% sure that the variant you are testing is the cause of the change.

 

Generally a 95% statistical significance and above is very accurate, and you can be sure that the variants you are testing are the causes of your changes.

 

A 90% – 95% statistical significance is good and is generally accurate, but there may be a small chance that other factors caused the changes and not the variants you are testing.

 

With anything below 90% statistical significance, we can not guarantee that the changes are from the variants or from something else. In that case, we recommend that you either run the test longer or make an educated decision based on

  • the data the test has given you
  • other data you have collected
  • your experience
  • and your intuition

 

You may also want to abort a test before it has reached over a 90% significance because you want to run a promotion, edit your listing, or because you may feel that the test won’t reach significance.

 

If that is the case, you are more than welcome to abort the test at any time, and go off the statistical significance we have provided after you abort the test.

How We Determine Statistical Significance

 

First, we pull all your old sales data from the listing you are testing. We look at your past data to help analyze your results and formulate statistical significance.

 

We take your past sales data and determine its volatility, so that we may factor it into our statistical significance test. When we say volatility, we mean extreme spikes and dips in your sales history.

 

These spikes and dips are usually anomalies that don’t represent your normal sales statistics. This volatility may be caused by some of the real world factors we discussed above.

 

In addition, these spikes may come from promotional giveaways, which we don’t want to include in your sales data because they are not reflective of real sales.

 

If you don’t have previous sales history because your product is new, we consider your sales history as extremely volatile. This means that in order to reach statistical significance, you will have to run your test longer.

 

Second, we run each variant for at least 7 days. Each variant will run at least once every day of the week so we can account for differences in sales during certain days.

 

We then compare the sales data from your original listing to your variant listings to determine if there is a change in sales data, and why that change occurred.

Why We Look at the Volatility of Your Previous Sales

 

Remember, we are trying to tell you the real reason behind the changes in your sales statistics. Is it due to the variant you are testing or some other factor?

 

Well, if you have a lot of spikes and dips in your past statistics, how can we be sure that the changes in your sales statistics after you run a test are due to the variants you are testing, and not due to normal spikes and dips as seen in your sales history?

 

Think about it this way, if your previous data is not volatile (or smooth/ consistent), and then we start a test. and all of a sudden see a huge spike or dip in sales, well it is more likely that this huge spike/dip was caused by what you are testing.

 

We can be more confident in this because it is an anomaly out of your normal consistent sales, which your previous data demonstrates.

 

But if your past data is volatile- contains a lot of spikes and dips- and after we run your test you again experience a spike/ dip, it will be harder to tell which caused the spike/dip.

 

It is harder to tell the reason behind these spikes/dips, because these spikes/dips are not out of the norm as demonstrated by your sales history.

 

Was it what you are testing, or just a natural occurrence as seen in your sales history?

 

Therefore, the more volatile your data, the lower your statistical significance would be. The lower your statistical significance, the less we can be sure that the variant your are testing caused the change and not some other factor.

 

Don’t worry though. There is still hope for data that is volatile. We would just have to run the test a little longer.

 

If we run tests longer we can factor out more of the volatility, and give you a more definitive answer whether these variants caused these changes or not.

 

Why Running Longer Tests Gives You a Higher Statistical Significance

 

The longer you run a test the less chance a sudden spike/dip is due to real world randomness.

 

Let’s take a real world example. If you flip a coin 10 times, it is perfectly reasonable to get heads 8 times. Some of you reading this may say that is impossible, but it’s true.

 

Now, if you flip a coin 1,000 times the results would be more in line with what you expect – 50% heads, 50% tails. Why? Because the more times you do something, the less randomness will affect the results.

 

The same goes for your sales in Amazon. The more time your run a test, the more likely any spikes or dips will be factored out when looking at the aggregate.

 

We also like to run longer tests because we cannot show two variant listings at the same time, because they are in fact the same listing. It takes a few minutes to make any of these variant changes in Amazon.

 

In addition, Amazon is notoriously bad at reporting sales data, so we can only pull  data once a day.

 

What this means is, the longer you run a test, the higher your statistical significance will be and the more data we can pull from Amazon.

Conclusion

 

It can be hard to determine what caused a change in your sales statistic. As an Amazon split testing software, this of particular concern to us.

 

We want to be confident that the changes in your sales statistics are do to the variants you are testing with us, and not other real world events or randomness.

 

So how can we be confident? We use good ol’ math, and come up with statistical significance that reflects how confident we are that your changes caused the results.

 

By testing each variant at least once a day, looking at your previous sales data, factoring out anomalies, and mitigating volatile spikes and dips, we are able to give you an accurate reflection of what is actually causing the changes in your data, which is reflected in you statistical significance percent.

Sharing is caring!

andrew browne

andrew browne

Code Wizard at Splitly
Software developer and Amazon seller from Ireland. Constantly searching for travel adventures, greasy burgers, and all things tech.
andrew browne

Leave a Reply

Your email address will not be published. Required fields are marked *