AdinErmie.com

A site dedicated to Cloud and Datacenter Management

Book Review: Terraform in Action

Recently, I finished reading Terraform in Action by Scott Winkler.

For those that know me, I already have a decent amount of hands-on and real-world experience with Terraform, and even deliver presentations/training on it.

But when I saw this publication, it caught my attention for the following reasons:

  • It offered advanced designs, such as zero-downtime deployments and creating your own Terraform provider
  • It included advanced templating techniques
  • Module planning/structure, and
  • Advanced Terraform state concepts

I particularly found the whole book helpful but in particular Chapter 4 (“Deploying a Multi-Tiered Web Application in AWS“), and Chapter 9 (“Zero Downtime Deployments“).

I’ve decided to share my highlights from reading this specific publication, in case the points that I found of note/interest will be of some benefit to someone else. So, here are my highlights (by chapter). Note that not every chapter will have highlights (depending on the content and the main focus of my work).

Chapter 1: Getting Started with Terraform

  • The main use case of Terraform is to deploy ephemeral, on-demand infrastructure in a predictable and repeatable fashion.
  • Configuration management (CM) tools, like Chef, Puppet, Ansible and SaltStack, were all designed to install and manage software on existing servers. They work by performing push or pull updates and restoring software to a desired state if drift has occurred.
  • CM tools favor mutable infrastructure, whereas Terraform and other provisioning tools, favor immutable infrastructure.
  • HCL attempts to strike a balance between human and machine readability and was influenced by earlier field attempts such as libucl and Nginx configuration.
  • HashiCorp has repeatedly stated that they do not have any intentions to offer a premium version of Terraform, now or ever. Instead they plan to make money by offering specialized tooling that makes it easier to run Terraform in high-scale, multi-tenant environments.
  • The difference between declarative and imperative programming is the difference between getting in a car and driving yourself from point A to point B, vs. calling an Uber and having your chauffeur drive you.
  • Note: Declarative programming cares about the destination, not the journey. Imperative programming cares about the journey, not the destination
  • The way Terraform integrates with all the different clouds is through providers. Providers are plugins for Terraform that are designed to interface with external APIs.
  • When Terraform runs it will read all files in the current working directory with a “.tf” extension and concatenate them together.
  • Resources are declared as HCL objects, with type resource and exactly two labels. The first label specifies the type of resource you want to create, and the second is the resource name. Name has no special significance and is only used to reference the resource within a given module scope
  • Warning: It is important not to manually edit or delete the terraform.tfstate file, or else Terraform will lose track of all managed resources
  • Note: terraform destroy does exactly the same thing as if you were to delete all the configuration code and run a terraform apply.
  • The contents of a data source code block are called query constraint arguments and behave exactly the same as arguments do for resources. The query constraint arguments are used to specify which resource(s) to fetch data from. Data sources are unmanaged resources that Terraform can read data from but doesn’t directly control.

Chapter 2: Lifecycle of a Terraform Resource

  • Tip: The <<- sequence indicates an indented heredoc string. Anything between the opening identifier and the closing identifier (EOT) is interpreted literally. Leading whitespace, however, is ignored (unlike traditional heredoc syntax).
  • terraform init always must be run at least once, but you’ll have to run it again each time you add new provider or module dependency.
  • After initialization, Terraform creates a hidden .terraform directory for installing plugins and modules.
  • Tip: version lock any providers you use, whether they are implicitly or explicitly defined to ensure that any deployment you make is always repeatable.
  • terraform plan not only informs you what Terraform intends to do, it acts as a linter, letting you know of any syntax or dependency errors you might have. It’s a read only action that does not alter the state of deployed infrastructure, and like terraform init, it’s idempotent.
  • Tip: if terraform plan is running slow, turn off trace level logging and consider increasing parallelism (-parallelism=n )
  • Regardless of how you generate an execution plan, it’s always a good idea to review the contents of the plan before applying. During an apply, Terraform creates and destroys real infrastructure, which of course has real-world consequences.
  • Warning: It’s important to not edit, delete or otherwise tamper with the terraform.tfstate file, or else Terraform could potentially lose track of the resources it manages. It is possible to restore a corrupted or missing state file, but it’s difficult and time consuming to do so.
  • When terraform plan is run, Terraform will call Read() on each resource in the state file.
  • terraform apply -auto-approve. The optional -auto-approve flag tells Terraform to skip the manual approval step and immediately apply changes.
  • Warning: -auto-approve can be dangerous if you have not already reviewed the results of the plan
  • The surprising outcome of terraform plan is merely the result of the provider choosing to do something a little odd with the way Read() was implemented. I don’t know exactly why they chose to do it the way that they did, but they decided that if the contents of the file don’t exactly match up exactly with what’s in the state file, then the resource doesn’t exist anymore. The consequence is that Terraform thinks that the resource no longer exists, even though there’s still a file with the same name.
  • You can think of terraform refresh like a terraform plan that also happens to alter the state file. It’s a read-only operation that does not modify managed existing infrastructure, just Terraform state.
  • Note: deleting all configuration files and running terraform apply is equivalent to terraform destroy

Chapter 3: Functional Programming and Advanced Templating Techniques

  • Variable names must be unique within a module scope (we’ll go over what this means in the next chapter), and cannot use certain reserved names, but otherwise have few restrictions.
  • Variables accept three optional input arguments: · default – a literal value to set the variable to when no other value is found. Leaving this argument blank means the variable is mandatory and must be set · description – a string value that provides some helpful documentation to the user type – a type constraint to set for the variable. Types can be either primitive (e.g. string, integer, bool) or complex (e.g. list, set, map, object, tuple)
  • All primitive types in HCL can be coerced to string types. For example, Boolean true/false values evaluate to “true” and “false” and likewise, all numbers are converted to string representations when parsed.
  • Enforcing a strict type schema on all of your variables is important to fail fast and prevent bad input from corrupting your downstream resources, but it’s not without its limitations. If you need more granular validation on input values, you could use conditional expressions.
  • Warning: Conditional expressions can make code difficult to read, if not done with care. Use them sparingly.
  • Warning: Not all functions in Terraform are idempotent. You should avoid non-idempotent legacy functions such as uuid() and timestamp() , because they may cause subtle bugs within Terraform. Terraform was built around the notion of resource immutability; as soon as you take that assumption away, you’re going to have a bad time.
  • The brackets that go around the for expression determine the type of the output. The previous code used [] , which means the output will be a list. If instead we used {}, then the result would be an object.
  • Execution order for an apply starts at the bottom and works its way up, so nodes with fewer dependencies are created first while nodes having more dependencies are created last.
  • Note: You can combine count with a conditional expression to toggle whether or not you want a resource to be created (e.g. count = var.shuffle_enabled ? 1 : 0 )
  • The expression “count.index” is how to reference the current index of a resource when count is set.
  • Explicit dependencies are declared using the “depends_on” meta argument and are reserved for situations where you have a hidden dependency between resources.
  • Explicit dependencies behave exactly like implicit dependencies but are confusing and should be used cautiously – only when absolutely necessary.
  • Data sources are not considered resources for the purposes of an apply

Chapter 4: Deploying a Multi-Tiered Web Application in AWS

  • Modules are self-contained packages of code that allow you to create reusable components by grouping multiple related resources together. They allow you to view pieces of infrastructure only in terms of the inputs and outputs, without requiring any knowledge of the internal workings.
  • Like resources and data sources, modules have inputs and outputs
  • Modules can be endlessly nested within other modules, so they make for excellent tools for breaking complexity into small, reusable bits.
  • You typically don’t want to have more than three or four levels of nested modules, otherwise it becomes difficult to reason about (like a deeply nested class hierarchy).
  • In accordance with established code conventions and best practices from HashiCorp, we’ll split module code into three unambiguously named configuration files:
    • main.tf – the primary entry point containing all resources and data sources
    • outputs.tf – declarations for all output values
    • variables.tf – declarations for all input variables
  • A root module is just the entry point for Terraform, so wherever you initialize and apply Terraform, that directory is the de-facto root module.
  • We’ll be using the root module mostly for code organization and won’t use it to actually deploy any resources as that’s the job of the nested modules.
  • You should always evaluate for yourself whether or not it’s worth the tradeoff to use an external module vs. writing the Terraform code yourself. Using other people’s code can save you time in the short term, but it can also be a source of tech debt later down the road, especially if something were to break in an unexpected way.
  • Tip: reasoning about how data needs to pass between modules will often make it clear how you should componentize your software systems. Modules that need to share a lot of data should be closer together, while modules that are more independent can be further apart.
  • Warning: while it may be tempting to overuse the any type constraint, it’s a lazy coding habit that will get you in trouble more often than not. Only use any when passing data between modules, and never for configuring the variables of the root module.
  • Tip: never grant more access to data than a given module needs for legitimate purposes. As in normal software development, the inputs and outputs of a module represent the interface, so don’t pass in more data than required by the interface.
  • Whenever I am planning out the code for a Terraform module, I always consider inter-resource dependencies (i.e. what depends on what) because it helps me predict potential race conditions that require an explicit depends_on.
  • Only outputs from the root module show up in the command line after applying.
  • As directed by HashiCorp, it is best practice that each module has at least the following three files: main.tf, variables.tf and outputs.tf.

Chapter 5: Serverless Made Easy

  • It’s important to recognize that by breaking your application into functions, you’re accepting a tradeoff between decreased code complexity and increased wiring requirements between components.
  • Tip: As a rule of thumb, I would suggest having no more than a few hundred lines of code per Terraform configuration file. Any more and it becomes difficult to build a mental map of how the code actually works.
  • The real world is a messy place and doesn’t usually lend itself well to categorization.
  • Here is a list of steps that I suggest taking when tackling a new problem with Terraform:
    • 1. Define the problem and goals
    • 2. Research potential solutions, while keeping an open mind
    • 3. Select key technologies and tools to leverage
    • 4. Build a prototype to determine if further investment is warranted
    • 5. Develop final product
  • If you want to solve useful problems with Terraform, you also have to be willing to try new things and make mistakes.
  • Tip: problem solving is an art, and the only way to get better is with practice if there are some Terraform resources that are poorly implemented, or are otherwise incomplete and don’t have all the features that the corresponding ARM template has, you may be better off deploying an ARM template with Terraform.

Chapter 6: Terraform with Friends

  • A backend in Terraform determines how state is loaded and how operations like terraform plan and terraform apply are executed.
  • Enhanced backends are relatively new and allow you to do sophisticated things like run CLI operations on a remote machine and stream the results back to your local terminal.
  • Flat modules (as opposed to nested modules) are when you organize your code by creating lots of little .tf files within a single monolithic module. Each file in the module contains all the code for deploying an individual component, which would otherwise be broken out into its own module. The primary advantage of flat module structures over nested modules is reduced boilerplate, and easier codebase navigation.
  • Tip: There’s no fixed rule about how long the configuration code in a single file can be, but I find it best to stick to a ~200 lines maximum rule of thumb
  • Warning: Think carefully before deciding to use a flat module structure for code organization. This pattern permits a high degree of inter-component coupling, which can make your code difficult to navigate and understand.
  • Believe it or not, having a README.md is actually a requirement for registering a module with the Terraform Module Registry.
  • Tip: there’s a neat open source tool called terraform-docs that automatically generates documentation from your configuration code
  • Note: You’ll need to upload your code to GitHub even if you only wish to use the Terraform Module Registry, because the Terraform Module Registry sources from public GitHub repos
  • Create a repo with a name of the form: terraform– . There’re no rules about what “provider” and “name” mean in the context of a module, but I typically think of “provider” as the cloud that I am deploying to, while “name” is a helpful descriptor of the project.
  • Workspaces allow you to have more than one state file for the same configuration code. This means that you can deploy multiple environments without resorting to copy-pasting your configuration code into different folders. Each workspace can use its own variables definitions file to parameterize the environment
  • Technically workspaces are no different than simply renaming state files. The reason you would use workspaces is because remote state backends support workspaces and not the -state argument.

Chapter 7: CI/CD Pipelines as Code

  • Passing a provider explicitly means that you override the implicit (or default) provider with another provider.
  • When more than one provider declaration is present, one provider will always be designated as the default provider, while any others are designated as non-default providers.
  • Tip: passing providers explicitly is most commonly used for multi-region deployments.
  • The for_each meta-attribute accepts a map, or set of strings, and creates an instance for each element in that map or set.
  • For_each does not guarantee sequential iteration (because sets and maps are inherently unordered collections).
  • For-each is most similar to the meta-attribute count, but has a number of distinct advantages, namely:
    • 1. Intuitive –For-each is a much more natural concept, compared to iterating by index
    • 2. Less verbose – syntactically, for_each is shorter and more pleasing to the eye
    • 3. Ease of use – instead of storing instances in an array, instances are stored in a map. This makes individual resource instances much easier to reference.
  • For-each is the recommended approach for creating dynamic configurations, unless you have specific reason to access something by index
  • By inserting delays with the local-exec provisioner, you can solve many of these strange race condition style bugs.
  • Beware that Terraform does not keep track of changes to provisioners in the same way it does for resources attributes; no copy is stored in the state file and there is no way to calculate diffs.
  • HashiCorp has stated that resource provisioners are an anti-pattern and they may even be deprecated entirely in a newer version of Terraform.
  • Dynamic blocks can only be used within other blocks, and only when the use of repeatable nested configuration blocks is supported (surprisingly uncommon).
  • Dynamic nested blocks act much like for expressions but produce nested configuration blocks instead of complex types. They iterate over complex types (such as maps and lists) and generate configuration blocks for each element.
  • Warning: Backdoors to Terraform (i.e. local-exec provisioners) are inherently dangerous and should be avoided whenever possible. Use them only as a means of last resort.

Chapter 8: A Multi-Cloud MMORPG

  • Being a Terraform guru means having the wisdom to know that just because you can do something doesn’t necessarily mean you should.
  • Tip: Under no circumstances should you skip creating a versions.tf , no matter how much you are tempted! It has saved me more than once from breaking changes in new provider versions.
  • Nomad is a general-purpose application scheduler made by HashiCorp that functions as a type of container orchestration platform.
  • Consul, on the other hand, is a service mesh solution (also produced by HashiCorp) and is most similar to Istio
  • Cluster federation is indispensable for organizations wishing to support operations in multiple clouds, without needing proportionally large teams of engineers to do so.
  • Note: Consul and Nomad follow raft consensus protocol, meaning there must be an odd number of servers (with a minimum number of three) to have quorum. One of these servers is designated the leader while the others are followers. There is no such restrictions for clients.
  • In many ways, working with Terraform is a lot like building with Lego bricks. Terraform has all these providers, which much like individual Lego sets, give you a huge assortment of pieces to work with. You don’t need any specialized tools to assemble them — they just fit together because that’s how they were designed.
  • Keep an open mind when working with Terraform. The best design may not always be the most obvious one.
  • Terraform is the glue that binds managed services together as part of a single deployment
  • Two-stage deployments are best when there is a clear distinction between “infrastructure” and “application”. Infrastructure is any of the virtual machines, clusters and managed services that applications run on. Each stage should have its own state.

Chapter 9: Zero Downtime Deployments

  • Zero Downtime Deployment (ZDD) describes the practice of keeping services always running and available to customers during software deployments.
  • Due to the way provisioners were implemented, a resource is not marked as “created” or “destroyed” unless all creation-time and destruction-time provisioners have executed with no errors. This means that we can use a local-exec provisioner to perform creation-time health checks
  • Although it would appear that create_before_destroy is an easy way to perform zero downtime deployments, it does have a number of quirks and shortcomings that should be kept in mind: · Confusing – once you start messing with the default behavior of Terraform, it’s harder to reason about how changes to your configuration files and variables will affect the outcome of an apply. This is especially true when local-exec
  • Provisioners are thrown in the mix.
    • Redundant – Everything you could accomplish with create_before_destroy could also be done with two Terraform workspaces or modules
    • Namespace Collisions – because both the new and old resource must exist at the same time, you have to choose parameters that will not conflict with each other.
  • This is often awkward, and sometimes even impossible, depending on how the parent provider implemented the resource. Force New vs. In-place – not all attributes force the creation of a new resource. Some attributes are updated in-place, which means that the old resource is never actually destroyed, but merely altered. This also means any attached resource provisioners won’t be triggered.
  • Tip: I personally do not use create_before_destroy as I have found it to be more trouble than it is worth
  • Note: Managing stateful data for Blue/Green deployments is notoriously tricky. Many people recommend including databases in the base layer, so that all production data is shared between Blue and Green.
  • When performing the manual cutover, you mitigate risk by not having all your infrastructure in the same workspace.
  • Access keys should be rotated as frequently as possible – at least once every 90 days. The last thing you want is give someone more time to mine cryptocurrency in your account and/or hack customer data.
  • Serials and lineages are how Terraform internally validates state files. It’s how Terraform “knows” that a state file is not stale (i.e. out of date) or, worse, from another workspace entirely.
  • In the newly created state, Terraform also assigns a serial number which begins at one. Each time the state file is changed, due to modifications in the configuration code or what have you, the serial number is incremented by one.
  • You can think of lineage as a fingerprint to ensure continuity of a state file within a given workspace. Terraform always expects the lineage to remain the same from deployment to deployment, or else something is drastically wrong.
  • Some remote state backends will actually throw an error if you try to deploy with the wrong serial number because it’s better for Terraform to fail fast than proceed with invalid input.
  • It is absolutely critical to ensure that the lineage is preserved, and the serial number is incremented by one to avoid any potential complications when modifying state.
  • Corrupt state files occur most often when dealing with buggy providers, upgrading state files from Terraform 0.11 to 0.12, and importing resources that do not support the import command.
  • Note: Best practice is to not edit the state file manually!
  • Tip: Whenever I perform a major refactor, I always keep a copy of the old configuration code and state around just in case I make a mistake and need to start over. Better safe than sorry.
  • In general, I find loading multi-line strings works best with the file() function rather than EOF because it reduces visual clutter in the Terraform code and allows you to use the auto-formatting and syntax highlighting features available in most IDE’s.
  • There are three options for modifying state: 1. Manually editing the state file 2. Moving state data with terraform state mv 3. Deleting the old resource with terraform state rm and then reimporting the resource with terraform import
  • Tip: Manually editing the state is not recommended under ordinary circumstances. You would only do this if you needed to manually change one of the attribute values on a managed resource for some reason.
  • We will use method two, terraform state mv, to move the IAM resources around. This command will move an item, such as a resource or module, matched by the source address, into the destination address. It can be used for resource renaming, moving items to and from modules, moving entire modules around, and more.
  • The terraform import command allows you to import existing resources into Terraform, which allows you to bring unmanaged resources under the yolk of Terraform. Not all resources support the import operation, but most do.
  • Tip: If you are stuck with a corrupted state file, which means you can’t successfully apply or destroy, it’s usually easier to remove the offending resource from state than to solve whatever error Terraform is throwing your way.
  • Address is the valid resource address of where you want your resource to be imported to. The ID is the unique ID of the resource which can be used to find the existing resource from the API

Chapter 10: Refactoring and Testing

  •  Refactoring is the art of improving the design of code without changing existing behavior or adding new functionality. Benefits of refactoring include:

    • Maintainability – the ability to quickly fix bugs and address problems faced by customers.
    • Extensibility – how easy it is to add new features. If your software is extensible, then you are more agile and able to respond to marketplace changes.
    • Reusability – removing duplicated and highly coupled code. Reusable code is readable and easier to maintain.
  • There are (at least) three levels of software testing to consider: unit tests, integration tests and system tests.
  • What we do care about is integration tests. In other words, for a given set of inputs, does a subsystem of Terraform (i.e. a module) deploy without errors, and produce the expected output?
  • We can target the destruction and recreation of individual resources with the terraform taint command.
  • Note: if you ever taint the wrong resource, you can always undo your mistake with the complementary command: terraform untaint
  • The biggest refactoring improvement we can make it to put reusable code into modules.
  • Module expansions make it possible to use count and for_each on a module in the same way you could for a resource. Instead of declaring modules multiple times, now you only have to declare it once.
  • Like for_each on resources, for_each on a module requires providing configuration via either a set or a map.
  • Why Not Use Sets? I recommend using maps instead of sets whenever you have more than one attribute that needs to be set on a module. Maps allow you to pass entire objects, whereas sets do not. Moreover, you can only pass in a set of type set(string), meaning you would have to awkwardly encode data in the form of a JSON string and then decode it with jsondecode()if you wanted to pass more than a single attribute worth of data.
  • The reason why embedding string literals, especially multi-line string literals, is generally a bad idea, is because it hurts readability. Having too many string literals in Terraform configuration makes it messy and hard to find what you’re looking for. Better just to keep this information in a separate file, and read from it using either file() or fileset().
  • Although convenient, splat expressions are less useful than they could be since they only operate on lists.
  • Unfortunately for us, Terraform state migration is rather difficult and tedious. It’s difficult because it requires intimate knowledge about how state is stored and it’s tedious because – although not entirely manual – it would take a long time to migrate more than a handful of resources.
  • To migrate state, we need to move or import resources into a correct destination resource address.
  • There are three options when it comes to migrating state:
    • Manually editing the state file (not recommended)
    • Moving stateful data with terraform state mv
    • Deleting old resources with terraform state rm and reimporting with terraform import
  • Of the three methods, the first is the most flexible, but also the most dangerous because of the potential for human error. Methods two and three are easier and safer.
  • Note: you can move a resource or module to any address, even one that does not exist within your current configuration. This can cause unexpected behavior, which is why you have to be careful to get the right address.
  • Note: check with the relevant Terraform provider documentation to ensure imports are allowed for a given resource
  •  A resource’s ID is set at the provider level, and is not always what you think it should be, but it is guaranteed to be unique. You can see what it is with terraform show, or figure it out yourself by reading through provider documentation.
  • Importing resources is the same as performing a terraform refresh on a remote resource. It reads the current state of the resource and stores it in Terraform state.
  • Being a tool developed by HashiCorp, terraform-exec has feature parity with Terraform, whereas Terratest does not. You can run all Terraform CLI commands with terraform-exec, using any combination of flags, while Terratest only allows a small subset of the most common commands.
  • Because of how hard refactoring can be, it’s often a good idea to test your code at the module level. You can do this with either Terratest or the terraform-exec library. I recommend terraform-exec, because it was developed by HashiCorp, and is the more complete of the two. Ideally you should perform integration testing on all modules within your organization.

Chapter 11: Extending Terraform by Writing your own Provider

  • The main job of any Terraform provider is to expose resources to Terraform and initialize any shared configuration objects.
  • Terraform will always initialize these shared configuration objects first, before performing any actions against provider resources.
  • Note: If a provider fails or “hangs” during initialization, it is almost always due to a shared configuration object having invalid or expired credentials there are normally two prerequisites to creating your own provider: 1. Existing API – this one should be pretty obvious, but it’s also easily overlooked. Since Terraform makes calls against a remote API, there must be an existing remote API to make calls to. This may be your own API or someone else’s. 2. Golang Client SDK for the API – as providers are written in golang, you should have a golang client SDK for your API in place before proceeding. This will save you from having to make ugly, raw HTTP requests against the API
  • Tip: Always have separate repositories for the client SDK and the provider! Providers are already sufficiently complicated, and there’s no need to make it harder on yourself by combining SDK code with provider code.
  • Providers are plugins for Terraform which communicate with Terraform over RPC (remote procedure calls). As long as providers implement the expected interface, they can be written in any language.
  • Note: Normally provider authors create matching read-only resources (a.k.a. data sources) to compliment managed resources
  • The provider schema is important because it defines the attributes for the provider configuration, enumerates resources made available by the provider, and initializes any shared configuration objects. All this takes place during the terraform init step, when the provider is first installed.
  • Schema is a parameter that outlines the allowed provider configuration attributes in Terraform.
  • schema.EnvDefaultFunc. This function makes it possible to set a default environment variable to use if the attribute is not directly set in the provider configuration.
  • Tip: it is good idea to make critical configuration attributes, such as access keys and addresses, optionally configurable as environment variables – for ease in automation the terraform providers schema command can be used to print detailed schemas for the providers used in the current configuration.
  • It’s important to know when each of the four CRUD functions will be invoked, to be able to predict and handle any errors that may occur.
  • ID is important because without it, the resource won’t be marked as “created” by Terraform, and neither will it be persisted to the state file.
  • Warning! Read() should always returns the same resource from the API. If it does not, you will end up with orphaned resources. Orphaned resources are resources originally created by Terraform, which have been lost track of, and are now considered unmanaged.
  • Force new updates are inconvenient from a user perspective because it takes longer for changes to propagate. This is an example where a good user experience matters more than ease of development or strict adherence to infrastructure immutability.
  • Writing good tests can be tough, but well it’s worth the effort, especially on large code bases with multiple contributors.
  • At a bare minimum, however, a resource test file requires the following:
    • Basic create/destroy test with validation that attributes get set in the state file
    • A function to check that all test resources have been destroyed
    • Test HCL configuration with all input attributes set
  • Tip: Most provider authors use a Makefile and CI triggers to automate the steps of building, testing, and distributing the provider. I recommend looking at some simpler providers, like terraform-provider-null and terraform-provider-tfe for inspiration
  • Note: It is helpful to set TF_LOG=TRACE when testing providers, to ensure that API requests and responses are as expected
  • Where I see developing custom providers fitting best is with micro-APIs and self-service platforms.
  • Acceptance testing means writing tests for the provider schema and any resources exposed by the provider. Acceptance testing hardens code and is crucial for production readiness.

Chapter 12: Terraform in Automation

  • Having a manual approval stage is important to allow stakeholders (i.e. anyone that has an invested interest in the outcome of a Terraform deployment) read the output of terraform plan before approving an apply.
  • There are five default environment variables:
    • TF_IN_AUTOMATION ,
    • TF_INPUT ,
    • CONFIRM_DESTROY ,
    • WORKING_DIRECTORY and
    • BACKEND .
    • The first two are for configuring the Terraform runtime, the third is a flag for triggering a destroy run, the fourth sets the current working directory, and the fifth configures the remote state backend.
  • Most Terraform configurations are written in HCL because it’s an easy language for humans to read and understand, but it’s also possible to write Terraform configuration in JSON instead. This alternative syntax (suffixed with the .tf.json extension) is normally reserved for automation purposes, because it is more machine friendly than HCL, and because many programming languages already have native libraries for processing JSON.
  • Warning! it is not recommended to turn off manual approval for anything mission critical! It’s like performing a terraform apply -auto-approve without even checking the results of the plan first. There should always be at least one human that is verifying the results of the plan before changes are applied

Chapter 13: Secrets Management

  • Terraform does not treat attributes containing sensitive data any differently than it treats non-sensitive attribute. Therefore, any and all sensitive data gets put in the state file, which is stored as plaintext JSON.
  • There are only three configuration blocks that can store stateful data (sensitive or otherwise) in Terraform. These being:
    • resources,
    • data sources, and
    • output values
  • Removing unnecessary secrets is always a good idea, but it won’t prevent your state file from being leaked in the first place. To do that, you need to treat the state file itself as secret and gate who has access to it.
  • If you are using Terraform Enterprise or Terraform Cloud, your data is automatically encrypted at rest by default. In fact, they double encrypt it, once with KMS and again with Vault.
  • Every single API call that Terraform makes to AWS appears in the trace logs– with the complete request and response objects included.
  • Tip: always turn off trace logging except when actively debugging local-exec provisioners are inherently dangerous and should be avoided whenever possible.
  • Even when trace logging is disabled, local-exec provisioners can be used to print secrets in the log files
  • Note: AWS access keys are not the only thing that local-exec provisioners can expose. Any secrets stored in the filesystem or runtime environment of the machine running Terraform are also vulnerable.
  • Tip: in case you are interested in creating custom resources without writing your own provider I recommend looking into the Shell provider for Terraform
  • External data sources are particularly nefarious because they run during a terraform plan, which means that all a malicious user would need to do is sneak this code into your configuration code and make sure it gets run during a terraform plan in order to gain access to all of your secrets. No manual confirmation of an apply necessary.
  • Tip: always skim through any module you’d like to use, even if it comes from the official module registry, to ensure that no malicious code is present
  • Note: External data sources are perhaps the most dangerous resource in all of Terraform. Be extremely judicious with their use, as there are many clever and devious ways that sensitive information can be leaked with them.
  • If you have continuous integration webhooks setup on a repository, do not allow terraform plan to be run for pull requests initiated from forks. This would allow hackers to run external data sources without your explicit approval.
  • There are two major ways to pass static secrets into Terraform: A) as environment variables and b) as Terraform variables. I recommend passing secrets in as environment variables whenever possible because it is far safer than the alternative. For one, environment variables do not show up in the state or plan files, and for two it’s harder for malicious users to access your sensitive values as compared to Terraform variables.
  • When configuring a Terraform provider, you definitely do not want to pass sensitive information in as regular Terraform variables
  • In general, configuring sensitive information in providers with Terraform variables is dangerous. The reason why it is so bad is because it opens you up to the possibility of someone redirecting secrets and using them elsewhere.
  • Warning! malicious Terraform code can access any secret stored anywhere on the local machine running Terraform!
  • By ensuring that Terraform runs are always linked to a Git commit, troublemakers will not be able to insert malicious code without leaving behind incriminating evidence in the Git history.
  • Ideally, secrets should not even exist until they are needed (i.e. leased or created “just-in-time”) and they should be revoked immediately after use. Secrets like these are called dynamic secrets, and they are substantially more secure than static secrets.
  • Sentinel policies are not written in HCL, as you might expect, instead they are written in Sentinel. Sentinel is its own domain specific programming language, which has a passing resemblance to Python.
  • Note: Sentinel policies are not easy to write! You should expect a high learning curve, even if you are already a skilled programmer.
%d bloggers like this: