I came across Terraform soon after it was released in 2014 and it quite clearly fit a need in the world of cloud infrastructure engineering.
As I worked with Terraform over the years, I’ve learned a lot, some by reading the experiences of others, but mostly by making mistakes.
Terraform is a powerful tool, which makes it extremely useful, but also somewhat dangerous. You can use it to stand up seriously complex infrastructure, but if you are not careful you can also accidentally destroy all of that at 4:30pm on a Friday (learned that one the hard way).
When I embarked on starting a shared infrastructure team here at the Chan Zuckerberg Initiative , I got the opportunity to reflect on some of those experiences and strategize ways in which we, as a common infrastructure team, can help our fellow engineers at CZI effectively and safely build infrastructure.
We can certainly help by educating and consulting with teams: teaching them about cloud services and tools to manage them. For teams practicing infrastructure as code, we can also give code reviews and feedback on architectural decisions.
But, the most powerful thing we can do is to write tools that make it easy to do things well and hard or impossible to do things poorly.
Today we’re open-sourcing one of those tools - fogg.
Fogg is a tool for managing the organization of Terraform code in a git repository. We have taken everything we can learn about how to effectively program in Terraform and boiled it down into an (opinionated) tool that encodes those best practices via code generation.
First, though, let’s talk about these problems we’re trying to solve…
The first set of problems stem from a feature: Terraform’s power.
The power of Terraform allows you to manage large and complicated infrastructure. You can have a single directory where you run terraform apply which contains thousands of resources. The force multiplier here is huge-a single operator can orchestrate changes across a large infrastructure footprint.
This, of course, is also the danger. That same power that enables you to do great things can be used against you in an accident. Run the wrong command and your entire infrastructure has been destroyed (if you can’t ctrl-C fast enough).
This can be addressed by splitting up your Terraform code into separate directories, each of which has their own state file. That simple, manual solution works great for a single developer and a few scopes, but relies on significant diligence as you scale up. As teams grow, the process of managing the various terraform scopes needs to be automated, and we do that with fogg in a way that minimizes the pain for infrastructure engineers.
Terraform relies on state files to track the relationship between your code with its identifiers and deployed resources. If you lose or corrupt the state file, this mapping is lost.
So it goes without saying that managing this file is quite important. It needs to be stored remotely, should probably be encrypted and needs to be updated atomically.
With fogg, we automate this configuration, so none of our users have to worry about mis-configuring a state file.
On top of that, we name our state files deterministically, so they are cleanly organized in S3.
On top of the bigger issues listed above, having a consistent style for Terraform is quite tedious. This is probably true for all programming languages and environments, but it seems that the Terraform community has yet to develop a thorough set of stylistic patterns.
Chief among these open questions for us was - how do you organize the repository of Terraform code? There are certainly folks who have shared bits and pieces of their experiences with organizing Terraform code, but nothing as thorough as we’d like. So we set out to create a generic repository structure that is generally useful and can be consistently applied across a large number of projects. Its working on 7 projects so far, so we’re pretty confident it can work for many more.
In addition to solving some unique problems, we take a few unique approaches.
The first is that we rely entirely on generating code. Other tools like Terragrunt provide a lot of functionality by taking over the process of running Terraform plan and apply, and adding a bunch of extra functionality. It’s a great tool, but without the ability to inspect exactly what is going to happen when you run the process, it is hard to build trust.
So, we took an approach of generating Terraform (and Make and Bash) code only. You can then read everything and build a mental model of what’s going on before running it. And if you ever decide you don’t like how it works, you can just stop running the code generation and iterate away. Typical output from fogg’s code generation.Second, we decided to create an opinionated tool, rather than a general-purpose tool. You can still build really any infrastructure you want with fogg, but there is definitely a “fogg way” of doing things. This way involves a collection of best practices and just plain arbitrary decisions. Following this way has made things easier for many of our teammates.
Finally we’ve decided to focus on AWS only at this point and that has allowed us to get something working very well. We are certainly open to making it work well with other Terraform backends.
This is the just the beginning for this tool. We certainly have a lot to do still, but believe that there is value in releasing tools early.
Check out fogg and raise an issue or ping us on gitter if you have any questions.
This article was also posted on Medium.