🔥Let’s Do DevOps: Shard GitHub Actions Workload over Many Concurrent Builders
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
I’ve recently been building a tool I’m calling the GitHub Cop. It’s a tool that processes all 1.5k+ repos in my GitHub Org and sets all the settings, branch protections, auto-link rules, permissions, etc. That’s a lot of changes each night, and on every repo sequentially.
To do that, I’ve built several tools, including an API token circuit breaker to avoid running out of API tokens when doing tens of thousands of actions in a short time, and how to paginate API calls when processing thousands of attributes, but none of those processes help speed up the single-threaded processing of thousands of repos.
What would help is having several threads working concurrently. Thankfully, GitHub Actions makes this pretty easy to do! I did have to teach bash some arithmetic, but I was able to make it work.
Let’s talk about theory, then how I implemented this change in my custom bash program, and the limitations I faced and how I solved them.
GitHub Actions Theory: Matrix
Sharding (a word not recognized by my computer’s dictionary), is the practice of splitting a single workload into several parts, and then assigning each those work “shard” or fragments to one of several builders. It’s usually assumed that the workload would then be processed concurrently. It’s a method of reducing the total time a workload runs by splitting it across several runners.
Because this is a relatively foundational topic of computing, GitHub Actions makes this relatively easy using the
matrix strategy. In math, a “matrix” means a rectangular array or table where numbers are stored and often combined in a predictable way.
In a GitHub Actions context, a
matrix is values that are combined and handed off to the tasks you list on the job. If you have a single matrix value like
attribute: [foo, bar], the job will be run twice, once where
attribute is set to
foo and once where
attribute is set to