Mono vs Multi – Which Repo Structure Is Right For You?

Recently, while engaged on a large geographically dispersed and multi-team project, the question was posed: “Should we use a single (aka “mono”) repository for our Infrastructure-as-Code (IaC), or should we use multiple (aka “multi”) repositories?”

Code Repository Example

So, what’s the right answer?

The Hashi-Terraform Suggestion

Since this project was using HashiCorp’s Terraform templating language (along with the Terraform Enterprise product), the first thought was to find out what their recommendation was.

Terraform has an article in their Terraform Enterprise documentation, that discusses the recommendations for repository (aka “repo”) structures. You can read that article here.

Terraform Enterprise Integrations

I’m not going to copy/paste that whole article here, but here are a few highlights/points that it brought out worth noting:

  • As a best practice for repository structure, each repository containing Terraform code should be a manageable chunk of infrastructure, such as an application, service, or specific type of infrastructure (like common networking infrastructure).
  • Small configurations connected by remote state are more efficient for collaboration than monolithic repos, because they let you update infrastructure without running unnecessary plans in unrelated workspaces.
  • Using a single repo attached to multiple workspaces is the simplest best-practice approach, as it enables the creation of a pipeline to promote changes through environments, without additional overhead in version control.
  • If you’re using Branches and creating a branch for each environment… The upside of this approach is that it requires fewer files and runs fewer plans, but the potential downside is that the branches can drift out of sync. Thus, in this model, it’s very important to enforce consistent branch merges for promoting changes.
  • If you are using Directories and creating a separate directory for each environment… The potential downside to this approach is that changes have to be manually promoted between stages, and the directory contents can drift out of sync. This model also results in more plans than the long-lived branch model, since every workspace plans on each PR or change to master.

So the summarizition from the Terraform information is: DON’T DO MONO-REPO!

The Rest of the World’s Thoughts

Now, of course, the Terraform approach is just one school of thought. Many other people have tried different approaches and different methods.

There is more than one path

I reached out to some friends of mine that were developers to get their thoughts on this debate. Here are some of the points they shared (either for or against mono-repo):

  • When moving/modernizing an application from a monolith architecture to a microservices approach, common issues like upgrading/maintenance change. If you break down the system by domain and build it in smaller interchangeable parts, it is easier to replace those components. This applies, not only at an application coding-level, but also code grouping/segmentation when it applies to Infrastructure-as-Code (IaC) and code repositories.
  • We’ve found difficulties with multi repos, especially ones that share dependencies between themselves. Having to bump repo versions and maintain dependencies across repos can be a pain.
  • We found multi-repos to take much more time with regards to dev startup. And then when you have shared dependencies, there isn’t a clear way to keep them synched and accessible to all repos.
  • There are trade-offs – one thing is I typically don’t allow my teams to use shared dependencies. It means your domain bounds are incorrect if you need to share dependencies (in most cases).
  • We currently are using a monorepo…. I’m torn regarding repo-per-service or monorepo – there is deployment complexity with a monorepo as well as devs needing a local copy of a lot of code they’ll never touch. Multi-repos tend to force decoupling better, but also require devs to know what lives where … no perfect solution. With smaller projects, a monorepo is likely sufficient, or maybe one for backend and one for frontend.
  • Another big aspect to work out is CI/CD. We use Jenkins and it’s not been without issue, especially when it comes to blue/green deployments. Start small and simple, optimise when needed, start monolithic and break out into microservices/split repos when needed.
  • In addition to this, there are questions we need to answer such as how to manage multiple developers all contributing to the same code base. Our code base has roughly 50 people on it. To maintain consistency and keep the quality bar high we need a branching/merging/PR review strategy that’s not too strict but allows features to be developed without major merge conflicts.
  • On the development side, we kind of moved away from one-service-per-repo to a single mono-repo per distinct application to ease dev environment setup pain. I know we’re not alone (Why Google Stores Billions of Lines of Code in a Single Repository).
  • And another interesting article: You too can love the MonoRepo
    • The monorepo changes the way you interact with other teams such that everything is always integrated. And hey, our industry has a name for that: continuous integration. If you don’t have a monorepo, you’re not really doing continuous integration, you’re doing frequent integration at best. In a monorepo, even pre-commit testing is already integrated.
    • Adopting a monorepo means every change can be integrated. If you believe in Continuous Integration (a requirement for Continuous Delivery) then you should seriously think about using a monorepo, and a build system that helps you keep up with the greater amount of testing.


So what’s the right answer?

Well, there isn’t one definitive right answer. The answer is, what’s right for you/your organization?

Sounds Right

I think you need to look at what you’re ultimately trying to achieve. That should also include what teams are involved, the separation of work/responsibility areas, etc. But then, we have the ever-growing push/chant of “DevOps” and the unification of Developers with Operations.

Really, what is the root-cause/reason for multi-repo? Is it to separate the netowrking team’s code from the security team’s, etc.? That can be done through workspaces, directory structure, etc.

As I mentioned in the My Terraform Resources List (for those that insist on being “cloud agnostic”) article I recently wrote, I am working on a long-term global scale project for defining an entire enterprise-grade Azure environment, completely through Infrastructure-as-Code (IaC) using Terraform, and Configuration-as-Code (CaC) using Chef.

In a large scale approach like that, which option is the best choice? From my current (inexperienced) perspective, I’m thinking a mono-repo is the right approach, at least for the foundation/fabric components (i.e. network, Express Route circuits, NSGs/ASGs/Rules, Domain Controllers, Policies, etc.), whereas having separate repositories for application-specific components may work best.

But, we’ll see. And as I progress along on this project, we may have to re-visit this topic entirely! Stay tuned.

%d bloggers like this: