VM Managed Disks, Resource Locks, and Failed Backup Snapshot Cleanups

Here’s another scenario that I ran into with a customer.

In Azure, for Production workloads, it is considered a best practice to place Delete Locks on your Resource Groups, Resources, etc. in order to prevent accidental deletion.

Resource Group - Delete Lock

Resource Group – Delete Lock

It is also considered a recommended practice, for Azure IaaS Virtual Machines, to use Managed Disks versus non-Managed Disks.

Managed Disks - Info Bubble

Managed Disks – Info Bubble

This may seem like common sense, but there are some caveats you need to be aware of.

 

The Scenario

So let’s say you have an Azure IaaS Virtual Machine (using Managed Disks), and you place a Delete lock on the Resource Group it is a part of (as depicted in the first screenshot above). So far so good, right?

However, let’s also include Azure Backup configured to take daily snapshots/backups of the Virtual Machine. Again, not an issue.

Azure Backup - Daily Backup Policy for VM

Azure Backup – Daily Backup Policy for VM

But, focus in on the Retention settings. Notice that it is configured to retain the daily backup for 31 days. What happens after that? Well, obviously, the system will delete the older backups as they are no longer needed.

Sounds simple enough, what could possibly go wrong?

But, if you include a Delete lock against the target Virtual Machine or the Resource Group it is a part of, you’re going to run into an issue.

 

The Issue

The Azure Backup Pre-Check feature checks your Virtual Machine’s configuration for issues that can negatively affect backup operations (i.e. network issues, NSG rules blocking communication, out of data Agents, configuration changes, etc.).

On a side-note, if you are encountering issues with Network Security Group (NSG) rules and Azure Backup, take a look at my other article: Getting Azure Backup to Work With Azure Security Center’s Just-In-Time VM Access.

 

If you have a Delete lock in place, the Azure Backup Pre-Check will throw a Warning as follows:

Issue Description: Failed to delete managed restore point corresponding to this backup after data transfer to vault.

Suggested Action(s): This is caused due to delete lock set up on the resource group which hosts managed disks. Please remove the lock to let Azure Backup service delete restore points which are transferred to vault.

Azure Backup - Backup Pre-Check - Warning Message

Azure Backup – Backup Pre-Check – Warning Message

 

The “Answer”

It may seem a little odd that we are encouraged to use Managed Disk for our Azure IaaS Virtual Machines, as well as place Resource Locks on Production systems/resources, but then the auto-cleanup of backups/snapshots are blocked because of this.

Through investigation and research, it was determined that the RestorePointCollection is actually maintained as an Azure Resource Manager (ARM) resource object, whereas the Snapshots are not. Therefore, because we are using Managed Disks, the Azure Backup restore point cleanup process is unable to delete any additional restore points because we have the Resource Lock on the parent object.

The workaround for this scenario (at least until there is a correction on the back-end), can be found on the Troubleshoot Azure Backup failure site, in particular, the very last entry.

In short, you need to remove the Resource Lock. You can perform some manual actions involving Chocolatey and the Azure CLI to delete the Restore Point Collection, but “the problem will re-appear if you lock the Resource Group again as there is only a limit of 18 restore points after which the backups start failing“.

Good to know.

%d bloggers like this: