Measure twice and release once. A/B tests of static sites

10 min readJun 4, 2024

A release starts with an idea. When that perfect idea comes in the brainstorming, the idea that will appeal to all users and attract new customers. The idea is presented to a team of managers, marketers and is unconditionally supported by everyone.

The technical task is elaborated and the task is given to the developers. They grumble, asking to make an unnecessary updates, set clearly inflated deadlines, but eventually do the task. The task is testing and going to the end users. At this point, the life cycle of the idea is complete. Now all that remains is to wait for a mass of fresh analytics and celebrate…

Some losses are allowed in the first week — there is little data, users are getting used to the updates, other improvements are being rolled out, a high influence of outliers. However, by the second week, it becomes evident that the idea not only did not attract new clients but also made some users use this product less.

The idea, which has gone through dozens of discussions and received hundreds of enthusiastic comments, has failed.

The Hypothesis Failed

The introduction turned out to be voluminous. But I wanted to start this article with the long journey of the hypothesis. Because it broke at the very beginning — it was supported only by people similar to its author. But these people are not the most suitable target audience, and maybe even their rare exceptions.

That’s why when changing existing functionality, they do not rely on the thoughts of the author and the team. To make the right choice, they conduct research, analyze existing product and market analytics, compare with competitors. But often behind all these methods goes the only reliable way to test the hypothesis on the business audience. This (attention!) — is to test it specifically on the business audience.

But not on all of it. This method is called A/B testing. And all the further narration will be devoted to it.

A/B Testing

As we found out above — A/B test is testing a hypothesis on the very audience of the business. This test is due to the comparison of one functionality (option A) and another (option B).

Sometimes A/B tests are conducted alternately — that is, they first measure option A, and then, the next week they measure option B. This variant will not be described in the article because it does not represent anything interesting in technical terms (and here will be nothing about data collection and analysis).

A/B testing can take place for checking changes — in which case option A remains the current functionality, as well as for checking several implementations of a new idea — in this case, both options contain new functionality and they are compared relative to each other.

Despite the name, there can be any number of options. The main thing is that the audience allows it. That is, it should be possible to collect enough data for each option, excluding outliers and interference.

So, suppose a decision is made to make a critical change to the website or application. At this moment, the seriousness of this change is assessed and a decision is made to implement it through an A/B test. At the same time, depending on the risks, they decide how to distribute the traffic.

Often the test starts with showing the new option only to 10% of users. Then, if the changes did not lead to a sharp deterioration in metrics on these 10% — it is extended to half of the users, so that the comparison would be full-fledged. Based on the results of this test, make a decision — to leave the new option or return the previous one.

At the same time, of course, according to the results of testing, it is possible to return the idea for revision and then launch the updated test. This can be repeated dozens of times until the necessary change leads to business metrics growth.

A/B Test Rules

Variants should contain only those changes that are being tested. The basic rule, but, for example, along with adding a new block to the page, you may want to change its colors. As a result, all changes will affect the results and it will be impossible to understand how the addition of the block specifically influenced.
A rule related to the first one — all test options should fall out to the user at the same speed, delay, and, of course, bugs.
Another derivative rule — avoid overlapping tests. If 2 or more tests are conducted on one page (with same conditions), they all distort the results of each other and it is almost impossible to identify these distortions.
The user should not understand that he is participating in the test. Knowing this, the user may behave differently, for example, leave the service or reload the page to exit the test.
In interface tests, the user should spin the roulette wheel only once. In the future, he should see the option that fell out to him. Here is primarily a question of user experience — if he cannot see the same content when re-entering — he will have an unpleasant experience from the service.

Testing with a “small difference”. Source: Pinterest

A/B Test Scheme

Now, having sorted out what, why, and how is conducted, we can finally move on to the most interesting — the technical part.

And it is worth starting with the basic scheme of the application’s work:

Client — server — client

A very simple communication scheme. The client addressed to the necessary address, the server processed this request and response to the client.

With the advent of A/B tests, this scheme begins to work a little differently. Now, doing identical requests at the same time and under the same conditions, different answers are expected — those very option A or option B.

In practice, this is usually performed by a layer — either at the CDN level, a regular middleware on the server, or other intermediate tools, such as nginx (module for conducting A/B tests in nginx). Further, for the simplicity of the story, just middleware will be used.

In fact, A/B tests can be conducted entirely on the client side. This is how Google Optimizer worked (but in September 2023 it was deactivated). The main problem with this approach was that the user who got option b was redirected to another page. This, in turn, made option B less comfortable for the user and gave out the testing.

This approach can be schematically described like this:

Implementation of A/B tests

Below will be described the solution on next.js, but it can be repeated on any other technology that can change cookies and make rewrites (or return a specific page).

In next.js, this is done by middleware, which runs in the so-called edge-runtime, i.e., at the CDN level. In fact, outside of Vercel (a platform for deploying applications, owning next.js), it’s just a part of the server that works before processing routes.

The first and simplest method of testing is to show one of the options without any conditions:

import { NextResponse, NextRequest } from 'next/server'

export function middleware(request: NextRequest) {
    if (request.nextUrl.pathname === '/home') {
        if (rollVariant() === 1) {
            return NextResponse.rewrite(new URL('/home-animated', request.url));
        } else {
            return NextResponse.rewrite(new URL('/home', request.url));
        }
    }
}

The user has entered the /home page, in the middleware a random option is selected. If the user gets option B — the home-animated page is returned, otherwise the standard home.

It is more convenient to make the options of the interface test separate pages — a new option — a new page.

root
--app
----about
------page.tsx
----home
------page.tsx
----home-animated
------page.tsx

How to choose which option to show to the user? Just roll the dice! If less than half — variant a, otherwise — variant b.

const rollVariant = () => Math.random() < 0.5 ? 1 : 0;

Now, depending on the rolled value, the user will receive from the server either a standard page or home-animated. For the same time and invisibly for the user.

However, with each entry, the user will get a random option. To prevent this from happening, you can write to the database that the client has become a participant in the A/B test. In the case of anonymous tests, test information can be saved in cookies and read from them in the future.

So, if the client already has a cookie recorded — you can skip the steps with the request check and option selection, and immediately issue the necessary page.

import { NextResponse, NextRequest } from 'next/server'

export function middleware(request: NextRequest) {
    if (request.nextUrl.pathname === '/home') {
        const prevVariant = request.cookies.get('ab_variant');
        const variant = prevVariant ?? rollVariant();
        let next: NextResponse;
        if (variant === 1) {
            next = NextResponse.rewrite(new URL('/home-animated', request.url));
        } else {
            next = NextResponse.rewrite(new URL('/home', request.url));
        }
        next.cookies.set('ab_variant', variant.toString());
        return next;
    }
}

Of course, these data need to be analyzed. Here are two options — send data from the server, in parallel with issuing the result to the user, or already on the client, having previously transferred the test results from the server. For the latter, you can use the previously created cookies.

Further, it may be necessary to launch A/B tests only on a specific group. This can be a certain share of users, users of specific browsers, users from specific companies, or anything else.

That is, it is necessary to check the user for a match and depending on the result, either include him in the test or not:

import { NextResponse, NextRequest } from 'next/server'

export function middleware(request: NextRequest) {
    if (request.nextUrl.pathname === '/home' && request.nextUrl.searchParams.has('utm_campaign')) {
        // ...
    }
}

Also, it may be necessary for only new users to participate. But, formally this is the same task as described above. This is a group of users who were not on the site before. In the case of anonymous users, it can be determined, for example, by the absence of cookies of the test, acceptance of policies, or analytics.

Of course, one test will not be enough and it will be necessary to run dozens, if not hundreds, of tests in parallel. The same logic will be used for this, but it will now be checked according to an array of instructions of the launched tests until the first suitable one.

Of course, every company will have different conditions, different requirements, and different orders. The basic example described above is a possible implementation, from which everyone can decide exactly what is needed and how.

Nevertheless, it was decided to try to implement a universal package for conducting A/B tests in next.js — @nimpl/ab-tests.

@nimpl/ab-tests

The first thing to note is that the package meets all of the above, including all the rules. At the same time, it has a number of pleasant possibilities, executed in a familiar API for next.js developers.

The operation of the package can be described as follows:

The main advantage of the package is the principle of finding a suitable test. Each test may include the keys has and missing. Those familiar with next.js know these keys from working with rewrites and redirects. For example, a test can be described as follows:

{
  id: 'some-id',
  source: '/en-(?<country>de|fr|it)',
  has: [
    {
      type: 'query',
      key: 'ref',
      value: 'utm_(?<ref>moogle|daybook)',
    }
  ],
  variants: [
    {
      weight: 0.5,
      destination: '/en-:country/:ref'
    },
    {
      weight: 0.5,
      destination: '/en-:country/:ref/new'
    }
  ],
}

This test will be performed for all users who come to the page with English locales and the label utm_*. Then the user will see either the base page for this company or a new one.

Each test also contains other keys, such as:

id - the identifier of the test, which will be written in the cookie;

source - another familiar key from next.js - the path on which the test is conducted;

variants - a list of options, which can be any number.

Each variant describes a weight — a weight and a destination (again, a familiar key from next.js). The main rule is that the total weight equals one.

Additional part

The development of the package did not end there. During the work on the package, it was decided to test its operation in several projects. However, adding a simple middleware turned out to be a real adventure. The problem is that the projects already had middleware — one with next-intl, one with next-auth.

Surprisingly, none of the projects had previously had the task of supporting two external middleware (only together with the internal ones). As a result of the search, it was not possible to find any solutions. All existing solutions work due to their own APIs — made in the style of express.js or even in their own vision. They are useful, well implemented, and convenient. But only in those cases when you can update every used middleware for them.

The situation here is quite different. It is necessary for each middleware to work as an original middleware from next.js. In general, another new solution was needed. I took it upon myself.

So @nimpl/middleware-chain appeared:

import { default as authMiddleware } from "next-auth/middleware";
import createMiddleware from "next-intl/middleware";
import { chain } from "@nimpl/middleware-chain";

const intlMiddleware = createMiddleware({
    locales: ["en", "dk"],
    defaultLocale: "en",
});
export default chain([
    intlMiddleware,
    authMiddleware,
]);

A small and neat insert.

You can check out these and other packages for next.js at nimpl.tech. Perhaps you’ll find some of the solutions useful (such as the getter getPathname for server components or class minifier). Also, I have made publicly available a utility I use for editing groups of JSON files — @nimpl/inio.