The Good, Bad, and Ugly of Terraforming Azure Network Security Groups (NSGs)

Here’s a scenario that I encountered recently with a client, and with the help of HashiCorp, was able to overcome.

Scenario

Firstly, we have some Terraform code that sets up a VNet and corresponding Subnets.

Aside from all the variables, and naming conventions we’re dynamically extrapolating, it’s fairly simple.

Next, in a separate Terraform module, we have some code that created a Network Security Group (NSG).

Now, both of these code modules can be run independently in a unit test, but, if you want to actually link them together via a dependency (i.e. to associate the NSG with the Subnet), you have to call each of these modules in the right order and pass the applicable details from one to the other.

More specifically, you have to call the NSG module first and obtain the NSG ID after it has been created, and pass that into the VNet module so that it can make the association. I’m not going to go into all the details of how that looks/works, but essentially it looks something like this:

And within the networking module code, you would have a separate resource (specifically the azurerm_subnet_network_security_group_association resource), and pass the NSG ID as a variable.

The Good

The good news is, this works. And it is the proper future-forward way of modularizing your VNet, NSG, NSG Rules, and NSG-to-Subnet Associations.

It is also a best practice to break-out your resources into separate code block vs having them inline (even if the resource supports in-line).

An example of this is the Network Security Group (NSG). You can either code the NSG Rule in-line with the NSG itself or, create it as a separate resource. Both examples are depicted below.

The Bad

The bad part to all of this, that I found, was, because I was calling the NSG module first (since we need the ID to pass to the Subnet), every time we made a change or update to the NSG Rules, Terraform would disassociate the NSGs from the Subnets! Even if there were no changes made at all!

That didn’t make any sense, especially if there were no updates to the NSGs themselves.

The Ugly

The ugliness of this is that the issue is a result of the current Azure Terraform Provider (the one jointly-managed by Microsoft and HashiCorp).

According to the azurerm_subnet_network_security_group_association documentation,

Subnet <-> Network Security Group associations currently need to be configured on both this resource and using the network_security_group_id field on the azurerm_subnet resource. The next major version of the AzureRM Provider (2.0) will remove the network_security_group_id field from the azurerm_subnet resource such that this resource is used to link resources in future.

What this means is, you have to include in the Subnet resource the network_security_group_id property. OK, so what’s wrong with that? Nothing. But, it does not like it when you use a variable to pass this value in!

That means, in short, you need to use a local resource reference (in the form of ${azurerm_network_security_group.ResourceName.id}). What that also means, is that you have to now create your NSG object inside the same module for your VNet (instead of following best practices to keep them in separate modules).

But what that further means, is that your NSG Rules are now stranded/orphaned in their own module, because they can’t associate to an NSG that does not exist yet (remember, we were calling the NSG module first).

Conclusion

Without going into the crazy amount of details (and large amounts of code refactoring we had to do), what we ended up having to do is this:

  1. Move the NSG resource creation into the VNet resource module
  2. Call the VNet creation module first, and ensure the output provides us with the NSG Names (not the IDs)
  3. Call the NSG creation module (which now only contains the NSG Rules), and pass in the NSG Names as variables

The worst point of all of this, aside from originally coding to best practices and having separate modules for NSGs/Rules vs VNets/Subnets, etc. is that when Microsoft does release the v2 of the Azure Provider for Terraform (at some unknown future date), we’re going to have to re-reverse all of these code refactors back to the original (and proper) way we had it.

Not to mention, the effect this code refactoring has on the current live environment’s state file! Essentially, we had to manually disassociate the existing NSGs from the Subnets, and delete them, before re-running our code (via Terraform Plan/Apply), so that duplicate NSGs would not be created (and error out when they tried to associate to a Subnet that already has an NSG associated to it)!

Hang in there, and keep Terraforming! Lots more insights to share as I progress further in my Terraform project.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *