vSAN Stretched Doesn't Always Mean 2x the Capacity

I get questions every now and then around sizing capacity requirements for vSAN Stretched Clusters. Many have written extensively around how much capacity is required. Here is one such write-up from our favourite in-house guru.

Sizing a vSAN Stretched Cluster - Duncan Epping

I have redirected countless to that link, but still get some coming back having confusions around sizing. General assumptions around stretched clusters, assuming you have 100TB to replicate across both sites, you will need to size 200TB on Site A and 200TB on Site B (that's if you are going with PFTT=1, SFTT=1 on Raid 1). 

True to some sense, assuming you are planning to replicate everything. 

What do you mean everything? I can understand where this confusion comes from. When we setup a new vSAN Cluster, we select the topology as part of the process (Single Cluster, 2 node or Stretched Cluster). It is also second nature to assume, it is enabled at the datastore level, given that's how traditional SAN does replications, at LUN level granularity. So, the confusion comes, assuming it is enabled at the Cluster Level, all data needs to be replicated and capacity is doubled or 2x!

That is where Storage Policy Based Management (SPBM) comes into play. As some may know, the secret sauce that binds vSAN together is the SPBM policies. These policies are enabled at the VM layer, and it is what vSAN uses to define protection and performance profiles. We can take it a step further and assign policies at the VMDK layer if we wanted to. So, if we can define protection at the granularity of a VM, we can easily create a stretched cluster policy to replicate just a couple of VM's, which in turn mean, 100TB worth of datastore, may only require 1TB of VM's that get replicated.

This is huge because, you only replicate what you need and in turn save on cluster resources / capacity on the remote site, not to mention less bandwidth requirement across both sites. 

So how do you then keep some data replicated and non-replicated?

To create a policy that will have site level protection and local site protection, a policy can look something like this.

Local & Remote Protection
Primary Fault-to-Tolerate (PFTT) = 1
Secondary Fault-to-Tolerate (SFTT) = 1
Fault-Tolerance-Method (FTM) = Raid 5

The above policy will ensure that the VM is replicated across both sites and at the same time protected via Raid 5 Erasure Coding protection on the local site. The policy can be further enhanced with DRS and Affinity rules to ensure the VM's will always use local hosts to prevent cross site access.

As for VM's that only requires local protection and not replication across sites, a policy could be as such.

Local Protection Only
Primary Fault-to-Tolerate (PFTT) = 0
Secondary Fault-to-Tolerate (SFTT) = 1
Fault-Tolerance-Method (FTM) = Raid 5

As you can see, you can explicitly define PFTT = 0. This will ensure that the data does not get replicated across. 

Hopefully this has cleared the air around vSAN Stretched Clusters.