Auditing for Disaster Recovery with Azure Policy

Here’s a scenario that I’ve been working on for a client recently that I wanted to share.

Let’s say you want to ensure that all Production VMs deployed in Azure are protected by Azure Site Recovery (ASR). How would you check/confirm this, other than check each and every VM, and each and every Recovery Services Vault (RSV) you have (assuming you have more than one)?

Azure Policy to the rescue!

Existing Policy

If you look at Azure Policy, and the list of ever-growing built-in policies available, you will see that there is a policy designed for this scenario, called “Audit virtual machines without disaster recovery configured”.

But, it’s not perfect. If you look at the actual policy definition, you’ll see that it is checking all Microsoft.Compute objects and looks for a resource link to ‘ASR-Protect-*’.

Customize the Policy

Remember at the outset, we said the scenario was to ensure that all Production VMs are protected by ASR. But, if we deploy/assign this Policy at a Management Group, Subscription, or Resource Group that may contain non-production workloads, it will come back with a bunch of “non-compliant” resources that we may not care about.

Let’s duplicate the built-in policy and modify it for our needs. We can add a Tag Name and Tag Value to the parameters, like so…

Notice in my example that I am specifying the exact Tag Name, ‘DisasterRecoveryServiceTier’ in my case, and restricting the AllowedValues list to ‘Tier-1’.

For reference, I have defined the Disaster Recovery Service Tiers as follows:

  • Tier-0 (< 10 minutes RPO/RTO) = use an active-active configuration
  • Tier-1 (< 4 hours RPO/RTO) = use Azure Site Recovery (ASR); this is what we’re checking for in the policy
  • Tier-2 (< 24 hours RPO/RTO) = use Azure Backup
  • Tier-3 (> 24 hours RPO/RTO) = user Azure Backup

So from this defined DR tiers, the only tier I need to audit against for using ASR is Tier-1.

Now down in the actual Policy Rule, we need to modify it a little by using the ‘allOf’ rule structure and then include the reference to the Tag Name and Value parameters.

This means we can control exactly what we’re auditing for. In this case, we are checking all Microsoft.Compute/virtualMachines that also have the Tag Name of ‘DisasterRecoveryServiceTier’ with a Tag Value of ‘Tier-1’, and auditing if they are protected by Azure Site Recovery (ASR).

This way, if a VM resource does not have this tag, or it’s labeled with a different recovery tier value, it will not show up as a false-positive, and throw our compliance out of alignment.

You could, of course, adapt this to your exact needs, by including a check for a Tag Name of ‘Environment’ and Tag Value of ‘Production’ or something similar.

Azure Policy – Disaster Recovery – Compliance Check

I hope this article will help you more accurately and specifically audit for disaster recovery in your Azure environments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *