Monorepository vs. Multirepository
I'm old enough to remember when monorepositories were "in" the first time around. As code became more complicated, best practices dictated the uncoupling of functionality, which led to microservices, minute packages, and complexities in builds, releases, and package management. The monorepository wasn't necessarily the problem, but monorepositories two decades ago meant that code was coupled together, especially for object-oriented applications with inter-linking packages and modules.
Slowly, but surely programmers came around to the uncoupling of code as people began to adhere to principles such a DRY and SOLID, and as agile methodologies and Lean DevOps won the market. The problem? Uncoupled code in separate repositories reveals its own problems.
For example: Imagine that you're working on a handful of web applications that share a common set of company-specific functionality. This could be security, utility libraries, templates, or any number of helper functions. Upon building your second application that uses similar functionality, you realize that you should abstract the common code into its own library, so you do. Other programmers on your team need access to this code, so you're a good soldier, and you package it up and get it into an internal or external package management tool (such as NPM for JavaScript packages or an internal NPM registry on Azure DevOps). Now the two applications you're working on both can use the common code without worrying about being in a large monorepository. Since each application is tied to a specific version of the package, changes can be made to the package without breaking applications that depend on it.
Next you add a third application that uses the package. Then you add a fourth and fifth. Now you discover a bug in the package. You then update the package to fix the bug, and you're able to update the fifth application you were working on to get it to use the package with the bug fix, but what about applications 1-4? You'll have to go back, check out all of those individual application's repositories, do an npm install
(or whatever) of the new package version in each of those applications, check them all back in, and potentially redeploy each on of them.
The caveat of a multirepository structure is that the more applications you have that depend on the packages you've built, each critical change to a package will need to be updated independently; otherwise, you will have to build tooling that updates or informs those applications when a shared package is updated (not uncommon with security fixes in major tools).
It's likely that you have to wrestle with package updates from third-party packages often enough that you don't want to have to wrestle with internal packages as well, and programmers will lose time to package updates and management, repository switches, and multiple application changes (not to mention potential context switching) when working within a multirepository structure that spans more than 8-12 applications and packages.
Visual Studio and .NET do a great job of working with multirepositories via utilization of a Solution file which identifies the projects in the solution, while Visual Studio has a great user experience for managing the individual repostories tied to the projects.
But can we really go back to monorepositories?
Remember that monorepositories 20 years ago meant solutions like CVS, SVN, or Visual Source Safe (again, I'm old). These source control management solutions weren't necessarily as flexible as Git is today. Source control management—along with automated testing and deployment management—have all evolved considerably since then, which warrants a second look at monorepositories.
For a successful monorepository structure, here's what we need:
- All the necessary code in one place
- Ability to easily access the code you need without the other code getting in the way
- Strict adherence to DRY and SOLID principles (or just uncouple your code)
- Changing a package should make that code available in your application immediatey without separate installations
- If updating the package breaks other applications, we need to know immediately
- Needing to deploy one application should not require a deployment of all applications (same with packages)
We'll address each of these in future posts, but first, let's address some of the philosophical and real-world issues that make many resist to monorepos.
Code can get coupled easily when working in a monorepository. It's very easy to write code that just works, and if the entire application (or multiple applications) are all in the same repository, it's easier to just tack code on. This can be resolved by failing builds based on code coverage reports. Automated unit testing isn't enough. You have to build code coverage into your build scripts because it's very hard to get to 80-85% code coverage with coupled, spaghetti code—even when using mocks. Of course, this doesn't prevent repeating yourself, but if you're writing the same code in the same repositories, and with the same unit tests, you're likely going to hate it, and the monorepository actually encourages you to abstract code, since it's easier to edit the abstract code in the monorepository instead of opening another version of Visual Studio Code (or any other editor) to edit the package, test the package, check the package in, and then install the package in your own application's repository.
Deployments are harder with monorepositories. This is a fact. When you're building out pipelines for individual packages and applications, you don't have to pay attention to how your packages and applications are potentially interrelated for development purposes, and something that runs fine in your location development environment could fail spectacularly during your build pipeline. Mitigating this requires tooling, and great tooling was one of the things missing when monorepositories were big in the 90's and 00's. With appropriate tooling, you can create and run the necessary steps to single out what you need in your monorepository for an individual build, including appropriate versioning of packages, and pushing those packages into package management repositories.
Tooling is hard—at least it can be. This is what prevents many people from getting up-and-running. But just like an appropriate DevOps pipeline, good tooling requires an investment by you and your company in getting things bootstrapped for automation early, in order to save through better processes and automated checks later down the road. Sometimes, this requires learning a new technology, and sometimes this requires you to build your own. Some people have a hard time with tooling; It can be the biggest obstacle in getting monorepositories right. In fact, even with 3rd-party tooling, you'll likely have to build your own integrations to get it working the way you need it to.
Monorepositories in our current fast-paced, release-often industry requires a large investment in tooling, whereas multirepository structures are cheap and fast to get moving on. With monorepositories, you'll need to evaluate and invest in the right source control, build utilities, refactoring practices, and company culture.
One thing to keep in mind: Some could argue that monorepositories violate the single responsibility principle (the aforementioned SOLID we're trying to adhere to). But I believe this applies more to software functionality than tools and process architecture. With appropriate tooling, you can actually speed up development, saving time, and becoming more productive with a monorepository than a multirepository. Tools like IDEs, source control, etc. should be mostly transient. After all, we don't make computers themselves adhere to a single responsibilities.
Let's also not forget that SOLID was meant as a collection of objective-oriented programming principles, so they might not even apply to all programming paradigms.
Now, this argument is a bit of a loaded question, but the answer is always going to be "it depends." But when I talk about a monorepository, I'm not talking about one for your entire organization's code; I'm talking about one for a group of interrelated projects and technologies that work well uncoupled, can be deployed separately, but generally fit under a two pizza rule.
One of the projects I'm currently working on deals with medical education technology: exam software, assessments, learning management content, and various other components that adhere to a common theme and are built in roughly the same technology. We're talking about several applications, several packages, and a handful of tools. In order to follow the DevOps principle of frequent deployments at smaller batch sizes, we want to deploy the applications and packages individually with a strong pipeline, but don't want to make the developers' lives and work harder through constant repository and project switches. This is why I emphasized "necessary" when I said: All the necessary code in one place. Software development architecture and processes are always about measuring trade-offs.
In the next DevOps post (different from the conversational software posts I've been writing), we'll take a look at an example monorepository, and I'll make the argument for why it makes sense for the business objectives our company is trying to accomplish.