Troubleshoot Cluster Configuring Network Prioritization on a Failover Cluster



Configuring Network Prioritization on a Failover Cluster


Hi Cluster Fans,
This blog will describe the Network Prioritization feature and how to configure it.  Network Prioritization is used to designate which type of traffic should be directed through which network in a Failover Cluster running Windows Server 2008 R2.  When designing a highly-available infrastructure, it is important to avoid any single points of failure, so there should be redundancy in all the hardware components, including the servers, storage and the networking.  For this reason we recommend and assume that there are multiple networks in your cluster.  If a network is unavailable Windows Server Failover Clustering will automatically direct traffic through another network to maintain service availability for applications or VMs.  However some networks may be preferred for certain types of traffic based on the network’s speed, security or function, so it can be important to designate which network is used for which purpose.
Types of Cluster Network Traffic:


Understanding the different categories of networking traffic can help you plan the best use for your networks. 

Cluster & CSV Traffic

We define cluster traffic as the information needed for the nodes to communicate with each other to ensure that it functions correctly.  This can include communication such as heartbeats for health-checking, updates to the cluster database, join requests from partitioned nodes, and much more.  Additionally Cluster Shared Volumes (CSV) traffic will use this network for metadata updates, or when it is in redirected mode, meaning that a node hosting a VM cannot directly assess its disk, so it redirects the traffic via another node to that disk.  This can be configured by right-clicking on the network in Failover Cluster Manager, selecting Properties, and selecting the radio button for “Allow cluster network communication on this network”.

Live Migration Traffic

If a cluster is taking advantage of the live migration feature to move running Virtual Machines (VMs) between cluster nodes, then live migration traffic is an important consideration.  During a live migration, large chunks of memory are copied from one server to another as quickly as possible.  This burts the network with heavy traffic, and can block other types of network communication from getting through.  For this reason it is recommended to have a dedicated network for live migration traffic so that it does not interfere with other important network traffic.  Live migration is commonly used when a physical host needs planned maintenance (such as patching and a reboot), so running VMs are live migrated from that host to avoid any downtime for the VM guests.  The faster that this network is, the more traffic can pass through it, thus the host can be evacuated quicker, so it is very common to see the fastest network being dedicated towards live migration.  This can be configured by right-clicking on the network in Failover Cluster Manager, selecting Properties, selecting the radio button for “Allow cluster network communication on this network”.

Public Traffic

Whether the cluster is providing high-availability for VMs, SQL databases, File Servers or anything else, a “client” needs to access that application, service or VM.  A “client” is defined loosely as a user or application which needs to communicate with that workload running on the cluster.  For the client to access that data, they need to make requests and have data sent back to them through those networks.  These networks are generally less secure as they are more exposed and accessible, and they could be subject to network flooding, which could impact performance or throughput of traffic.  For this reason it is recommended that you do not use this network for cluster traffic, live migration or any other use and to explicitly open it up to clients.  This can be configured by right-clicking on the network in Failover Cluster Manager, selecting Properties, selecting the radio button for “Allow cluster network communication on this network” and selecting the checkbox “Allow clients to connect through this network”.

Storage Traffic

If your cluster uses iSCSI or Fibre Channel over Ethernet (FCoE) for the cluster’s shared storage, this traffic goes through an Ethernet network which the cluster will identify as a cluster network.  To avoid storage I/O performance being affected with iSCSI or FCoE, it is recommended that you provide a dedicated network for storage traffic so that other network traffic does not interfere with this data.  For this reason it is recommended that you do not use this network for cluster traffic, live migration or any other use.  This can be configured by right-clicking on the network in Failover Cluster Manager, selecting Properties, and selecting the radio button for “Do not allow cluster network communication on this network”. 

Network Prioritization

Since it is a best practice to use multiple networks in your cluster, there is likely the need to specify the function for the various cluster networks.  This can be done on Windows Server 2008 R2 through the Network Prioritization (NP) feature.  NP will list the order of cluster networks and give the ability for the networks to be “ranked”, where different ranks indicate different network roles.  To rank a network, it is given a unique integer from 1 to 268,000,000+, which is called a “metric”.   
To view the networks, their metric values, and if they were automatically or manually configured, run the clustering PowerShell cmdlet:
PS > Get-ClusterNetwork | ft Name, Metric, AutoMetric
By default, all internal cluster network will have a metric value starting at 1000 and incrementing by 100.  The first internal network which the cluster sees when it first comes online has a metric of 1000, the second has a metric of 1100, etc.  We assume that a network is ‘internal’ if it does not have access to a default gateway.  The initial list of internal networks is determined by the order which the network adapters were seen by the cluster when it was created.
By default all external cluster network will have a metric value starting at 10000 and incrementing by 100.  The first external network which the cluster sees when it first comes online has a metric of 10000, the seconds has a metric of 10100, etc.  We assume that a network is ‘external’ if it has access to a default gateway.  The initial list of external networks is determined by the order which the network adapters were seen by the cluster when it was created.
The cluster will then use the order of the metrics as the order of networks.  The lowest network will be used for “Cluster & CSV Traffic”.  The second lowest network will be used for “Live Migration Traffic”.  Additional networks with a metric below 10000 will be used as backup networks if the “Cluster & CSV Traffic” or “Live Migration Traffic” networks fail.  The lowest network with a value of at least 10000 will be used for “Public Traffic”, and any additional networks with a metric above 10000 will be used as backup networks for “Public Traffic”.  Give the highest possible values to any networks which you do not want any cluster or public traffic to go through, such as for “Storage Traffic”, so that they are never used, or only used when no other networks at all are available, depending on your settings.
So let’s say you get the following output from running
PS > Get-ClusterNetwork | ft Name, Metric, AutoMetric

     Name                       Metric     AutoMetric
     ----                       ------     ----------
     Cluster Network 1          1000       True
     Cluster Network 2          1100       True
     Cluster Network 3          1200       True
     Cluster Network 4          10000      True
     Cluster Network 5          10100      True
In this scenario, these networks would carry the following types of traffic:
  • Cluster Network 1 (1000) – Cluster & CSV Traffic
  • Cluster Network 2 (1100) – Live Migration Traffic
  • Cluster Network 3 (1200) – Backup network for Cluster & CSV Traffic and Live Migration Traffic
  • Cluster Network 4 (10000) – Public Traffic
  • Cluster Network 5 (10100) – Backup network for Public Traffic

Configuring Network Prioritization

It is possible to customize NP if the cluster does not automatically assign networks to use the traffic pattern that you want, which will change the ranked order, and hence the function.  For example, you may want Cluster Network 3 to be used for “Live Migration Traffic” as it is the fastest, so you would change its Metric to a value between 1000 and 1100, such as 1050, so that it is ranked second on the list.  Once Cluster Network 3 has the second-lowest metric it will be used for Live Migration Traffic.
To change the value of a network metric, run:
PS > $n = Get-ClusterNetwork “Cluster Network 3”
PS > $n.Metric = 1050
This will change the metric of Cluster Network 3 to 1050.
Now you get the following output from running
PS > Get-ClusterNetwork | ft Name, Metric, AutoMetric

     Name                       Metric     AutoMetric
      ----                       ------     ----------
      Cluster Network 1          1000       True
      Cluster Network 3          1050       False
      Cluster Network 2          1100       True
      Cluster Network 4          10000      True
      Cluster Network 5          10100      True
You may have noticed that is a property associated with each network called AutoMetric.  This indicates whether the Metric was set using the default values (True) or if it had been later adjusted by an admin (False).  This gives insight into whether NP has been configured on the cluster.  Using this flag, it is actually possible to change the value of a network back to its original and automatically assigned value, by running the cmdlet:
PS > $n = Get-ClusterNetwork “Cluster Network 3”
PS > $n.AutoMetric = $true 

Overriding Network Prioritization Behavior

There are two ways to override the default behavior of NP.  The first is by changing the network’s properties by right-clicking on the network in Failover Cluster Manager, selecting Properties, and changing the radio buttons or checkboxes.  For example, if you select “Do not allow cluster network communication on this network”, then it will not be possible to send any “Cluster & CSV Traffic” or “Live Migration Traffic” through this network, even if the network has the lowest metric values.  The cluster will honor this override and find the network with the next lowest value to send this type of traffic.
The second override is exclusively for “Live Migration Traffic”.  The networks for live migration can be configured more granularly by right-clicking on any Virtual Machine resource, selecting Properties and clicking the Network for live migration tab.  Here you have the ability to specify which networks can and cannot be used for “Live Migration Traffic” and in which order they should be used.  Even though it appears that this setting may be unique to that specific VM, it is actually a global setting for live migration.  This means that it will override the “Live Migration Traffic” network configured through NP and all VMs will perform a live migration through the network(s) specified here.  If this setting is change multiple locations, the last change will be honored.
  
With this information we hope you are better able to understand how to deploy, configure and use your cluster networks to get the optimal performance and function from each.

Comments

Popular posts from this blog

altiris software key

Service Principal Names (SPNs) SetSPN Syntax (Setspn.exe)

Troubleshooting Netlogon Error Codes