In a previous post, I detailed some pseudo workaround steps that may be required, if you’re using the ‘Configure backup on VMs of a location to an existing central Vault in the same location’ Azure Policy (specifically across multiple Resource Groups). In this post, we’re going to take that policy, and tweak it to perform automated onboarding into Azure Backup at scale.
Let’s begin.
Recovery Services Vault (RSV) Backup Policies
Firstly, I created 3 ‘Azure Virtual Machine’ Backup Policies. You’ll see that I have these set for 1:00 AM, 2:00 AM, and 3:00 AM respectively.
Azure Policy and Initiative
Now we’re going to duplicate that built-in ‘Configure backup on VMs of a location to an existing central Vault in the same location’ Azure Policy 3 times and associate each with a specific Backup Policy.
So first, I have created the duplicates of the built-in policy as-is (no customizations yet) and labeled them “Backup at #:## AM” for easier identification.
Next, I create a new Initiative that will contain all of my backup policies. Notice that I’ve added each of my custom policies, and associated each with the appropriate location, and custom Backup Policy.
So now I have a custom Initiative, that contains 3 custom Azure policies that point/redirect to 3 custom Azure Backup policies.
Policy Assignment
If we were to assign this Initiative to a Subscription, or Resource Group, there would be a conflict.
One thing that I did discover is that when you assign the Initiative, the option to create a Remediation Task seems to be associated with a single Policy. Does this mean that all VMs will be remediated to the “Backup at 1:00 AM” backup policy? Do recall that each Policy has its own parameters and a DeployIfNotExists effect. We’ll come back to this shortly.
Policy Conflicts (It’s a race!)
As it stands, if we assign the Initiative (or the policies individually), there will be a conflict. Each policy basically says “Any VM deployed to Canada Central should be associated with the following Azure Backup Policy”. However, there are 3 policies that are all doing the same thing. So it’s a race condition, which policy gets to the VM first! Or, do the policies perpetually overwrite each other, as in, VM1 is associated with Backup Policy 1, then is ‘DeployIfNotExists-ed’ into Backup Policy 2, only to be re-associated yet a 3rd time to Backup Policy 3!
Let’s test that theory.
So, I created a single Virtual Machine in the target Resource Group where the Initiative is assigned to. And then I waited!
Interestingly, on the Azure Policy Compliance blade, the Initiative reported as being non-compliant. Notice that it says 1 non-compliant resource and 3 non-compliant policies.
However, when you navigate into the Initiative, it shows as compliant, and the 3 policies show as compliant as well!
To make things even more confusing, if you look at the Non-Compliant Resources, the VM is listed!
Only if you drill into the Non-Compliant Resource object itself, will it correctly show that it truly is non-compliant, and the policies as also non-compliant.
Very confusing.
Digging Deeper
I dug a little deeper into this, to understand what is/is not happening.
In the Initiative Events, there was a single entry.
Clicking on this event, brought me to the Activity Logs, where I noticed not only the ‘Failed’ status but more importantly, the ‘Succeeded’ entry for the DeployIfNotExists policy action.
Looking at the raw JSON output for the “Succeeded” action, we notice the Policy Definition Name of “f0dd6d65-d72e-442a-9804-9271420e9fbd” which if you check the individual policies, it corresponds to the “Backup at 3:00 AM” policy.
Checking the target VM, it clearly shows that it’s configured for the 3:00 AM backup policy.
Whoever Is Last, Is First?
So, that means, the very last Policy associated/listed in the Initiative will be the one that actually deploys correctly?
To test this theory, I re-ordered the policy listing in the Initiative (setting the “Backup at 2:00 AM” policy as the last in the list) and deployed a new VM.
Sure enough, the same pattern was observed, and the new VM successfully received the association to the “Backup at 2:00 AM” policy! Good to know. Now that we’re armed with that knowledge, let’s counter it with some customizations.
Policy Customizations
We’ve come a long way, but we’re not done yet!
Now that we understand how things work when you have multiple DeployIfNotExists policies within a single Initiative (which, does not have a Remediation Task created/configured), we can start to tweak and customize the policies to be more granular.
Similar to a previous blog post I wrote on Auditing for Disaster Recovery with Azure Policy, we are going to customize these policies.
We start by adding TagName and TagValue to the parameters. Notice in my snippet how I have restricted the TagName to ‘BackupWindow‘, and also limited the allowed values for the TagValue.
Then we modify the Policy Rule to include a reference to the TagName and TagValue.
After making these edits to each Policy, if you edit the Initiative, you will see the new parameters listed/included. Note that we set the default tag value to correspond to the specific policy. You could, for instance, set the tag value to 2:00 AM and have that associate to the 1:00 AM backup policy (though I don’t know why you would do that).
Testing the New Policies
After the policy configuration changes (to react only to the specific ‘BackupWindow’ Tag), I deployed a new VM with the appropriate Tag. To test thoroughly, I specifically set the ‘BackupWindow’ tag to ‘1:00 AM’ to ensure it wouldn’t just hit the last policy in the Initiative.
Within a short period of time, I checked the Initiative Events and its subsequent Activity Logs, and observed the expected PolicyDefinitionName field with its reference to the ‘1:00 AM’ Backup Policy!
Double-checking on the VM itself, it shows that it has been successfully registered to the target Recovery Services Vault (RSV), and the expected Backup Policy.
Success! This means we can use the TagName/Value combination to ensure association with the right Backup Policy!
One Final Test
I wanted to perform one final test. All the tests thus far, were with VMs that I newly created. How will this/these initiative/policies work with an existing VM? And so, I created a new VM, but first created it in a different Resource Group. Once the VM was deployed/running, I used Resource Move to move it into the target Resource Group that has the Initiative/Policies assigned to it.
The results?
The Initiative now shows that it’s not compliant, with 1 non-compliant resource and policy.
Within the Initiative, it clearly shows which Policy is not compliant.
But, because there is no Remediation Task created (if you’ll recall at the beginning of the article, the option to create a Remediation Task seems to only be able to be associated with a single Policy), the VM was not onboarded into the Azure Backup policy.
The Remediation Challenge
Of course, you can click on the ‘Create Remediation Task’ button, and select the appropriate Policy, and click ‘Remediate’, but that is a manual action, and we want to automate the onboarding process to the appropriate Backup Policy.
Through some additional testing, I was able to create a Remediation Task for the Initiative, however, it’s not exactly as it seems.
Even though it seems like you can select the applicable ‘Policy to remediate’ from the list, this isn’t exactly so. When I attempted to do so, I selected each individual policy from within the Initiative assignment and created a Remediation Task for each. However, even though each individual policy was selected, a Remediation Task was only created for the first Policy!
The Workaround
So, where does that leave us? Honestly, not in a great position. In order to have an Azure Policy to auto-onboard VMs into a specific Azure Backup policy, you have to create a Policy Assignment for each/every Policy, and ensure to include/create a Remediation Task for it. You cannot do so at the Initiative level.
This I feel is a bug in the logic of the Initiative remediation task, which I will take up with the Azure Policy team. But for now, this seems to be where we are at.
UPDATE: I connected with the Azure Policy team about this scenario and confirmed it’s a bug. Stay tuned for another update when a fix is available/released.
Conclusion
I know this article was rather long. It took quite a while to write it too! I hope that, by breaking down these elements, testing, and digging through the ARM output, has helped you in understanding not only how the policy works, but how you can tweak it for more (and better) granularity.
Even though we cannot remediate multiple policies within an initiative (apparently), at least you come away with the knowledge of how to create a custom/granular policy to auto-onboard to a specific Azure Backup policy, based on a Tag Name/Value that you set.