Skip to main content

Commvault Cloud Cleanroom Recovery - What is it?

·7 mins
General Commvault vExpert Backup Recovery Ransomware Azure
Table of Contents
I was asked by a friend to have a look at Commvault Cloud Cleanroom Recovery and give them a brief overview of it and my opinion of it. Whilst I was at it, I decided to share here as well.

When it comes to backup and recovery, or BC-DR (Business Continuity and Disaster Recovery), I have my share of stories. Like many, I’ve also been the cause of the odd incident or two - nobody is perfect! So I was very happy to do this.

Data protection evolution
#

Diagram illustrating the evolution of data protection and recovery solutions
The evolution of data protection and recovery solutions.

The evolution of solutions in this area could be thought of as being represented by the diagram above.

  • Operational Recovery - This is the conventional backup and recovery of files and data. The sort of activity that has been going on for many, many years. It’s probably the simplest and cheapest form of data protection to implement. You backup your files / data / VMs to one or more storage targets and you can recover them if there’s some sort of incident. Gordon in HR deletes a spreadsheet that means no-one will get paid this month. It’s ok, it was backed up. Or perhaps someone with root access to a Linux VM entered the command rm -rf / by accident.
  • Disaster Recovery - Often this is more costly and complex to implement, but with the advent of mainstream virtualization and cloud computing it has become easier. The increased cost and complexity are consistent with the magnitude of protection you’re trying to provide. Rather than protect individual files or VMs, you’re trying to protect against actual disasters. That once-in-a-lifetime amount of rain that floods your datacenter, or perhaps your server room gets burned up as a result of someone toasting a bagel in a breakroom. A combination of data backups and replication are usually involved in provising this kind of protection.
  • Cyber Recovery - Increasingly we hear of more frequent and serious cyber attacks. There are many reported each month and probably quite a few that go unreported too! Data leakage is one type of attack, but another (and arguably more serious) type is where data is either changed, corrupted, deleted, or encrypted. You can build the highest, most inpenetrable walls around your data and systems but attacks still happen. The challenge in these cases is working out what data you can trust. This is where Commvault are pitching their stall.

Trusting data after an incident
#

There’s little point in recovering data that is encrypted from a ransomware attack, or that might have been compromised in some way. If it was ransomware, the malicious code or data that caused it could have been present for some time. The problem however is identifying when your data was clean. Anything after that point probably can’t be trusted unless you have weeks or months to soft through it forensically.

Diagram illustrating the delta between trusted and untrusted data
Ransomware introduction may occur before its presence is felt. Recovery must occur from clean data.

Planning and testing recovery
#

Fortunately I have never been subject to a cyber attack, or even a situation where DR failover was required, although that doesn’t mean that I haven’t had to plan for them. Before my long tenure at VMware started, I worked for various VMware customers. One of these was a public sector (i.e. government) organisation and I recall having to create some of the runbooks for a DR test that we actually executed (partially anyway). Having a DR plan and being ready to use it are often very expensive and time consuming to maintain. Usually for one or more of the following reasons:

  1. Purchasing and maintaining duplicate hardware resources to act as a recovery target. In the event of a recovery being required, the original hardware might not be viable or trusted.
  2. Creating and maintaining a plan and runbooks to execute a recovery requires resources and time. Software is constantly changing and so runbooks must be updated. Business applications are changing and growing and those changes may need to be reflected.
  3. Testing recovery plans is time-consuming and often takes resources away from day-to-day operations. Because of this some organisations either do not test or cannot test completely. Remember, I said above that we only ever executed a partial test with that organisation I was at. We didn’t have the time to fail everything over and get all of the applications and data validated.
  4. Even when tests are conducted, they often leave out some critical steps as they’re too disruptive. For example, failing over an application and trusting users to validate it and the data but not actually make real changes? Unlikely to happen. Some things can only really be tried out on the day and if you really want to be ready for that, expect more cost. Maybe in terms of extra resources on standby or to stay on top of testing and runbook updates.
Diagram illustrating the increase in cost and effor of being ready
The increasing effort / cost of being ready for recovery.

Data protection and recovery can be an expensive business!

Commvault Cloud Cleanroom Recovery
#

That’s probably enough about why you’d want a recovery solution, what about how Commvault’s Cloud Cleanroom Recovery solution works and how it might solve some of these challenges.

The control plane
#

The control plane for Commvault can either be SaaS or deployed on-premise. One thing that Commvault seem keen to stress is that the control plane’s location doesn’t matter and that it’s not located with your data.

The UI looks simple enough and clean. I’ve seen a demo of it, but not tried configuring it myself.

New backup target
#

In addition to your standard raft of backup targets, Commvault introduces a new one. It’s an airgapped, immutable storage location hosted in Azure. That doesn’t mean that you have to recover to Azure, although it’s an option. It’s a scalable, cost effective target for backup data. In normal operation you’re just paying for storage used. There’s no compute costs, there’s no need to purchase and maintain an expensive storage array.

Diagram illustrating the addition of a new Azure backup target
Commvault Cloud Cleanroon Recovery backs up to an immutable Azure target.

Data is stored in an agnostic format so the recovery of your data could be to any target in the future. At the time of writing the options are more limited.

Also currently limited is support for specific applications. MSSQL databases, MySQL databases, unstructured data (e.g. file servers), and VM-level backups are well catered for. Support for other applications (other databases) and Active Directory is coming.

Recovery
#

Should recovery be necessary at any point, backed up data can be “rehydrated” and scanned. Compute resources are necessary for the scanning to take place, but these can swiftly be provisioned in Azure. You don’t pay for them if they’re not being used.

Scanning is necessary to help identify the delta between an incident and when your data could last be trusted. Effectively you are determining what recovery point can be used safely. Commvault have a number of tools built-in that help to determine when this point is. You can also be alerted to suspicious activity such as backup volumes suddenly increasing. These types of events may be innocuous, or they may indicate that an incident has taken place.

Commvault also include an automation framework that can aid with recovery. I haven’t seen this myself, although I’d be curious to get my hands on it :-)

Key features
#

The key features that Commvault appear to be differentiating with are as follows:

  1. Identifying last-known-good data - If you’re lucky then you know when a problem occurred and (if it was malicious) what the attack vector was. Often this knowledge only comes after some lengthy forensic work, if it comes at all. Commvault claims to help identify a safe recovery point for your data, eliminating time, effort, and cost from the recovery process.
  2. Cost reduction - Purchasing and maintaining infrastructure, floorspace, and licenses for DR is expensive and there’s a temptation to try and cut costs. Rather than having up-front and ongoing costs Commvault are reducing the costs to storage only, plus whatever Cloud Cleanroom Recovery costs, of course!
  3. Threat vector identification - During the backup process, Commvault can help identify possible threat vectors that organisations can tighten up, possibly preventing incidents in the first place.

Summary
#

Backup, replication, and recovery can be complex and costly and those are often two reasons that people use to skimp on their solutions. I like what Commvault are trying to do with Cloud Cleanroom Recovery. There are other approaches and other vendors in this space of course, and like with getting your kitchen remodeled it’s probably best to check them all out first. What’s really good though is how Commvault’s solution seems very relatable to a number of recent ransomware incidents that I’ve read about. It would be good to see it in action and try it out. That’s what I told my friend to do anyway! :grinning_face:

Related

Startup plan for vSphere clusters hosting a Kubernetes Supervisor
·5 mins
vSphere VMware vExpert Homelab Kubernetes
A runbook of steps to complete to startup a vSphere cluster that hosts a Kubernetes Supervisor.
Shutdown plan for vSphere clusters hosting a Kubernetes Supervisor
·6 mins
vSphere VMware vExpert Homelab Kubernetes
A runbook of steps to complete to shutdown a vSphere cluster that hosts a Kubernetes Supervisor.
Migrating my Homelab with VeloxServ
·4 mins
HomeLab VMware vExpert
A brief success story of moving my HomeLab from one VeloxServ datacenter to another.