Image by Florian Kurz from Pixabay
OK, this is clickbait and yes, I’ve already captured your attention! For those seeking the essence of this article without the frills, here’s the TL;DR:
Define the matrix of shards on your test running job with the number of jobs you want to run in parallel:
strategy:
matrix:
shard: [1,2,3,4,5,6,7,8,9,10] # as many jobs as you want, but less than numbers of tests
fail-fast: false
The SHARD_FILES env parameter will have the names of files to run:
- name: all test files sorted by size
run: echo "SHARD_FILES=$(find -type f -name "*.test.js" -exec stat -c "%s %N" {} + | sort -rn | cut -f2- -d ' ' | tr -d \' | sed -n '${{matrix.shard}}~${{strategy.job-total}}p' | tr '\n' ' ' )" >> $GITHUB_ENV
With the SHARD_FILES in your grip, you command your tests to run in a synchronized ballet of sharding:
- name: run tests
run: ${{env.SHARD_FILES}}
If you want to understand more about the logic, the rest of the article is for you.
Cautionary words abound! The commands above are tailored for Ubuntu, and will not work on Mac OS shells (and possibly Windows) where sed and stat differ in their versions.
Why Sharding?
Assume you have a large number of tests and you want them to run faster, you are probably going to run them in parallel. Instead of running 1 test file at a time, your machine is going to run multiple tests at the same time. You can usually achieve that by setting a -- parallel=4
configuration parameter (or a similar one) to indicate your test runner to run 4 tests in parallel. This option is supported by most test runners in the Javascript arena, and likely in other languages.
Running tests in parallel is only useful if you have a strong machine: 8, 12, or even 16 cores. Most test runners will default the number of processes run to the number of cores minus one. The 1 process is reserved for the parent process that orchestrates the tests, and all other cores are dedicated to executing the tests.
But when you run on CI, your machines are not going to be 12 or 16 cores. The standard Linux virtual machine on GitHub-hosted runners has 2 cores and 7 GB of memory. Probably miles behind your development machine.
On weak machines like CI machines, you should not use parallel execution. Instead, you should use Sharding. Sharding splits your tests to multiple machines by assigning each machine a subset of your tests to execute.
With sharding you do not execute tests in parallel on a SINGLE machine. Instead, you run tests serially on MULTIPLE machines.
But how do you run tests on multiple machines? well, You need to take all your files, split them into groups, each group contains some of the files, and assign each group to a machine. When running in parallel on a single machine, you have an orchestrater that pushes the “next file” to a worker that is available (or after a process has ended). But when working on multiple machines you cannot do that. You need to assign upfront all the files to these groups.
Quick note: the whole solution assumes your test runner can execute a list of files, which most runners can do. The command looks something like this:
npx my-test-runner file-a.test.js file-b.test.js file-c.test.js
Files Selection — Using Math
Your first thought is probably: not that complex. I need to know upfront how many machines I allocate (makes sense), and then I split the files into chunks according to the number of machines. Let’s do some math. Assume we have 62 tests and we split them over 10 machines.
Attempt #1: The very simple math is to divide the number of tests by the number of machines we have: 62 / 10 = 6. OK, 6 files on each machine. But then we are left with 2 files. So we need to apply a special logic on the “last” machine. And also the last machine will have 8 files, which will probably make it slower than the others. Also — what if we have 68 files? now all of our machines will get 6 files while the last one will get… 14. Not ideal.
Attempt #2: to avoid “leftovers” let’s divide by the number of machines but instead of rounding the value down ( “floor”) we will round it up (“ceil”). Now we need to assign 7 files to each machine. But now our first 8 machines will get 7 files, our 9th machine will get 6 and the 10th machine is left with, well — none. Waste of resources.
You can also try dividing by (machines + 1) or (machines -1) with floor and with ceiling. None of these maths works up too well.
Files Selection Using Distribution
The intelligent reader (that is YOU!) realized that the only way to properly distribute the files is like dealing a deck of cards. We are going to “deal” each machine with a single file until no more files are left in our deck.
Let’s take an example: we have 62 files and 10 machines. Here is how the tests will be distributed:
Machine 1: 1, 11, 21, 31, 41, 51, 61 (7 files)
Machine 2: 2, 12, 22, 32, 42, 52, 62 (7 files)
Machine 3: 3, 13, 23, 33, 43, 53 (6 files)
Machine 4 - 10: (6 files each)
But how do we write this? Sure, we could have written a nice Javascript (or Python, or Rust or Go) code to do it, but when it comes to CI, I am trying to make it as minimal as possible for the sake of clarity. I avoid large chunks of code that requires too much maintenance, and instead, I prefer using Bash commands on CI.
And Bash has very nice tools…
In this case, the glorious Bash command is called sed
, short of Streaming Editor. When sed gets a list of lines, it can use the -n 'x~yp'
parameter that tells it to start with the x-th line and pick each y-th line. So -n '2~4p’
means start with the 2nd line, then the 6th, then the 10th, etc. Exactly what we need to deal our deck of files.
our process is as follows:
- Read all the test files in a pre-defined order (e.g. alphabetically)
- On each machine, start with the current shard and skip the total number of shards to get the list of files to run in the current shard.
- Run the tests in the list of files for the current shard.
The important thing to notice here is that we need to make sure our test files are always read in the same order for each job. Otherwise, we might end up with some tests running twice while others will be skipped. Reading the files in alphabetical order can ensure that, but we can make it run even faster.
Let’s make it faster
When we read the files sorted alphabetically, files are just going to be randomly assigned to process. Although each machine gets approximately the same number of files, we might end up with one machine getting a list of really long tests while another will get the same number of files but of really short tests.
Ideally, we would like to also even the duration of the test execution. To do that we need to have some knowledge about how long each test runs. We can store information on the tests, but it requires us to constantly compare the list of files we are running every time we add, remove or change the test files.
Another option is to use heuristics.
A heuristic is a mental shortcut commonly used to simplify problems and avoid cognitive overload. Heuristics are part of how the human brain evolved and is wired, allowing individuals to quickly reach reasonable conclusions or solutions to complex problems. These solutions may not be optimal ones but are often sufficient given limited timeframes and calculative capacity. (Investopedia)
Heuristics are not perfect but might be sufficient to get our tests relatively balanced and without too much effort in storing the data and comparing it to changed files. We can use the file size as an indication of the duration the test takes to run. The longer the file is, the longer it needs to execute. (remember? this is a heuristic, so small files might take longer than expected, but overall it might end up nicely).
How can we split the files by their size? we will read our test files, sort them by their size, and distribute them across the different machines starting with the largest file to the smallest.
Putting it all Together
So now that we know how we want to do it: let’s write the code.
First, we need to tell GitHub actions to create multiple jobs that will run in parallel. This is done using the matrix parameter:
strategy:
matrix:
shard: [1,2,3,4,5,6,7,8,9,10] # as many jobs as you want, but less than numbers of tests
fail-fast: false
In the above, we have 10 jobs running in parallel. Each job will get the parameter matrix.shard
that indicates the current shard.
Next, we write a command that finds all files matching the pattern *.test.js
(adjust to your needs) in the specified directory (shown bellow as <dir>
. (the find command is working on directories recursively, so it will read all your test files under that directory). The command retrieves each file its size and name, sort them by size in descending order, and then outputs the selected files with their full paths in a single line using the current shard as the starting number and skipping the total number of shards ( ${{strategy.job-total}}
). All the names of the files that need to be tested are then stored in the environment parameter SHARD_FILES
that we can later use.
- name: create the list of files to test on the shard
run: echo "SHARD_FILES=$(find -type f -name "*.test.js" -exec stat -c "%s %N" {} + | sort -rn | cut -f2- -d ' ' | tr -d \' | sed -n '${{matrix.shard}}~${{strategy.job-total}}p' | tr '\n' ' ' )" >> $GITHUB_ENV
Let’s break down the above command step by step (or actually, let’s ChatGPT do it for us):
find <dir> -type f -name "*.test.js"
: This starts the search from the specified directory<dir>
and looks for all files (-type f
) with names ending in.test.js
using the-name
option. The<dir>
should be replaced with the actual directory path you want to search.-exec stat -c "%s %N" {} +
: For each file found byfind
, thestat
command is executed with the format-c "%s %N"
.%s
represents the file size in bytes, and%N
represents the file name along with its path.| sort -rn
: The|
(pipe) symbol is used to take the output of the previousfind
andstat
command and pass it as input to thesort
command. The-rn
option is used to sort the files in reverse numeric order based on their file sizes. This will display the largest files first.| cut -f2- -d ' '
: The sorted output is then piped to thecut
command. The-f2-
option indicates that we want to retain all fields starting from the second field. The-d ' '
specifies that the delimiter between fields is a space.| tr -d \'
: The output is piped to thetr
command, which is used to delete characters. In this case, it deletes the single quotes ('
) from the output.| sed -n '${{matrix.shard}}~${{strategy.job-total}}p'
: The output is piped to thesed
command, which performs a pattern-based selection on the lines. The${{matrix.shard}}~${{strategy.job-total}}p
expression is a placeholder that gets replaced with specific values in a CI environment. The${{matrix.shard}}
and${{strategy.job-total}}
are variables representing the current shard and the total number of shards in a CI job, respectively. Thesed
command selects lines starting from the${{matrix.shard}}
-th line, with increments of${{strategy.job-total}}
. This way, it selects lines based on the number of shards and distributes the output accordingly.| tr '\n' ' '
: Finally, the output is piped to thetr
command again, which is used to translate or replace characters. In this case, it replaces newline characters (\n
) with a space character. This will convert the list of files into a single line.
As a last step, we will use the environment parameter as an input to our test runner as the list of files to run:
name: run tests
run: ${{env.SHARD_FILES}}
And that’s it — your test runner now runs only on the set of files that were defined for this shard.
It is probably worth mentioning that some test runners like Playwright, Jest, and Vitest have the sharding option built in as part of their command line. For other test runners — you can use this method.
How Many Shards Should I Use?
It depends.
GitHub Actions pricing is per execution minute and not affected by the number of jobs that you run. If the total time of your tests is 200 minutes, this will not be reduced and you will still pay these 200 minutes.
However, you will pay for the extra overhead per machine. So if your tests require 4 minutes per machine to set up the test environment (things like code checkout, package manager installation, docker spinning, etc.) this will be increased.
Let’s assume your setup time is 3 minutes per job. Your tests are in 40 files summed up to a total of 200 minutes of execution. Here are some indicative numbers:
// using 10 jobs:
total execution cost: 200 + 3 * 10 = 230 minutes
Execution elapsed time: 200 / 10 + 3 = 23 minutes
// using 20 jobs:
total execution cost: 200 + 3 * 20 = 260 minutes
Execution elapsed time: 200 / 20 + 3 = 13 minutes
It is up to you to find the optimal sharding in terms of cost and execution elapse time.
Conclusion
When I was looking to implement sharding on our e2e tests that are using Cucumberjs, I could not find any guidelines and best practices. I went thru some painful attempts to come up with a good method. If I saved you some of the trouble, I am rewarded.
Thus concludes our poetic tale of sharding tests with wisdom and wit. For those who found this elucidation helpful, let the claps and shares resound like a standing ovation in the theater of the internet!