Important virtual and physical switch parameters
Before diving into the different design options around the example deployment, let’s take a look at the VDS (virtual) and physical network switch parameters that should be considered in all these design options. These are some key parameters that vSphere and network administrators have to take into account while designing VMware virtual networking. As the configuration of virtual networking goes hand in hand with physical network configuration, this section will cover both the VDS and Physical switch parameters.
VDS parameters
VDS simplifies the challenges of the configuration process by providing one single pane of glass to perform virtual network management tasks. As opposed to configuring vSphere standard switches (VSS) on individual hosts, administrators can configure and manage one single vSphere distributed switch. All centrally configured network policies on VDS get pushed down to the host automatically when the host gets added to the distributed switch. In this section an overview of key VDS parameters is provided.
Host Uplink Connections (vmnics) and dvuplink parameter
VDS has a new abstraction for the physical Ethernet network adapters (vmnics) on each host. This new abstraction is called dvuplinks that gets defined during the creation of the VDS. All the properties including NIC teaming, load balancing, and failover policies on VDS and dvportgroups are applied to dvuplinks and not to vmnics on individual hosts. When a host gets added to the VDS, each vmnic on the host is mapped to a dvuplink. This provides the advantage of consistently applying the teaming and failover configurations to all the hosts irrespective of how the dvuplink and vmnic assignments are made.
The Figure 1 below shows two ESXi hosts with four Ethernet network adapters each. When these hosts are added to the VDS with four dvuplinks configured on a dvuplink portgroup, administrators have to assign the network adapters (vmnics) of the hosts to dvuplinks. To illustrate the mapping of the dvuplinks to vmnics Figure 1 shows one type of mapping where ESXi hosts vmnic0 is mapped to dvuplink1 and vmnic1 to dvuplink2 and so on. Customers can choose different mapping if required where vmnic0 can be mapped to different dvuplink instead of dvuplink1. VMware recommends having consistent mapping across different hosts because it reduces complexity in the environment.
Figure 1 dvulpink to vmnic mapping
As a best practice, customers should also try to deploy hosts with same number of physical Ethernet network adapters and with similar port speeds. Also, as the number of dvuplink configuration on VDS depends on the maximum number of physical Ethernet network adapters on a host, administrators should take that into account during dvuplink portgroup configuration. Customers always have an option to modify this dvuplink configuration based on the new hardware capabilities.
Traffic Types and dvportgroup parameters
Similar to portgroups on standard switches, dvportgroups define how the connection is made through the VDS to the network. The VLAN ID, traffic shaping, port security, teaming and load balancing parameters are configured on these dvportgroups. The virtual ports (dvports) connected to a dvportgroup share the same properties configured on a dvportgroup. When customers want a group of virtual machines to share the security and teaming policies, they have to make sure the virtual machines are part of one dvportgroup. Customers can choose to define different dvportgroups based on the different traffic types they have in their environment or based on the different tenants or applications they support in the environment.
In this example deployment, the dvportgroup classification is based on the traffic types running in the virtual infrastructure. Once administrators understand the different traffic types in the virtual infrastructure and identify specific security, reliability and performance requirements for individual traffic types, the next step is to create unique dvportgroups associated with each traffic type. As mentioned earlier, the dvportgroup configuration defined at VDS level is automatically pushed down to every host that is added to the VDS. For example, in Figure 1, you can see that the two dvportgroup PG-A (Yellow) and PG-B (Green) defined at the distributed switch level are available on each of the ESXi host that is part of that VDS.
dvportgroup specific configuration
Once customers decide on the number of unique dvportgroups they want to create in their environment, they can start configuring those dvportgroups. The configuration options/parameters are similar to those available with port groups on vSphere standard switches. There are some additional options available on VDS dvportgroup that are related to teaming setup. These new options are not available on vSphere standard switches. Customers can configure the following key parameters for each dvportgroup.
- Number of virtual ports (dvports)
- Port binding (static, dynamic, ephemeral)
- VLAN Trunking/Private VLANs
- Teaming and Load Balancing along with Active and Standby Links
- Bi-directional traffic shaping parameters
- Port Security
As part of the teaming algorithm support, VDS provides a unique approach to load balance traffic across the teamed network adapters. This approach is called Load Based Teaming (LBT), and it distributes the traffic across the network adapters based on the percentage utilization of traffic on those adapters. LBT algorithm works on both ingress and egress direction of the network adapter traffic as opposed to the hashing algorithms that work only in egress direction (traffic flowing out of the network adapter). Also, LBT prevents the worst-case scenario that could happen with hashing algorithms where all traffic hashes to one network adapter of the team and other network adapters are not used to carry any traffic. To improve the utilization of all the links/network adapters, VMware recommends the use of this advanced feature (LBT) of VDS. The LBT approach is recommended over the Etherchannel on physical switches and route based IP hash configuration on the virtual switch.
Port security policies at port group level allow customer protection from certain behaviors that could compromise security. For example, a hacker could impersonate a virtual machine and gain unauthorized access by spoofing the virtual machines MAC address. VMware recommends to set the MAC address Changes and Forged Transmits to “Reject” to help protect against attacks launched by a rogue guest operating system. Set the Promiscuous Mode to “Reject” unless customers want to monitor the traffic for network troubleshooting or Intrusion detection purpose.
NIOC
Network I/O control (NIOC) is the traffic management capability available on VDS. The NIOC concept revolves around resource pools that are similar in many ways to the ones existing for CPU and Memory. vSphere and network administrators now can allocate I/O shares to different traffic types similar to allocating CPU and Memory resources to a VM. The share parameter specifies the relative importance of a traffic type over other traffics, and provides a guaranteed minimum when the different traffic competes for a particular network adapter. The shares are specified in abstract units numbered 1 to 100. Customers can provision shares to different traffic types based on the amount of resources each traffic type requires.
This capability of provisioning I/O resources is very useful in situations where there are multiple traffic types competing for resources. For example, in a deployment where vMotion and VM traffic types are flowing through one network adapter, it is possible that vMotion activity can impact the virtual machine traffic performance. In this situation, shares configured in NIOC provide the required isolation to the vMotion and VM traffic type and prevents one flow (traffic type) dominating other flow. NIOC configuration provides one more parameter that customers can utilize if they want to put any limits on a particular traffic type. This parameter is called the Limit. The Limit configuration specifies the absolute maximum bandwidth for a traffic type on a host. The configuration of limit parameter is specified in Mbps. NIOC limits and shares parameters only work on the outbound traffic i.e traffic that is flowing out of the ESXi host.
VMware recommends customers to utilize this traffic management feature whenever they have multiple traffic types flowing through one network adapter. This situation of multiple traffic type flowing through a network adapter is more prominent with 10 Gigabit Ethernet network deployments but can happen in 1 Gigabit Ethernet network deployments as well. The common use case for using NIOC in 1 Gigabit network adapter deployment is when the traffic from different workloads or different customer VMs is carried over the same network adapter. As multiple workload traffic flows through a network adapter, it becomes important to provide I/O resources based on the needs of the workload. With the release of vSphere 5, customers now can make use of the new user defined network resource pools capability and allocate I/O resource to the different workloads or different customer VMs depending on their needs. This user defined network resource pool feature provides the granular control in allocating I/O resources and meeting the SLA requirements for the virtualized tier 1 workloads.
Bi-directional traffic shaping
Apart from NIOC, there is one more traffic-shaping feature that is available in the vSphere platform. This traffic-shaping feature can be configured on a dvportgroup or dvport level. Customers can shape both inbound and outbound traffic using three parameters: average bandwidth, peak bandwidth, and burst size. Customers who want more granular traffic shaping controls to manage their traffic types can take advantage of this capability of VDS along with NIOC feature. It is recommended to involve network administrators in your organization while configuring these granular traffic parameters. These controls only makes sense when there are oversubscription scenarios that are causing network performance issues. These oversubscription scenarios could be caused because of the oversubscribed physical switch infrastructure or virtual infrastructure. So it is very important to understand the physical and virtual network environment before making any bi-directional traffic-shaping configurations.
Physical Network switch parameters
The configuration of VDS and physical network switch should go hand in hand to provide resilient, secure and scalable connectivity to the virtual infrastructure. The following are some key switch configuration parameters customer should pay attention to.
VLAN
If VLANs are used to provide logical isolation between different traffic types it is important to make sure that those VLANs are carried over to the Physical switch infrastructure. To do so, enable VST (Virtual switch tagging) on the virtual switch, and trunk all VLANs to the physical switch ports.
Spanning Tree Protocol (STP)
Spanning Tree protocol is not supported on virtual switches and thus no configuration is required on VDS. But it is important to enable this protocol on the physical switches. STP makes sure that there are no loops in the network. As a best practice, customer should configure the following.
- Use “portfast” on ESXi host facing physical switch ports. With this setting, network convergence on these switch ports will happen fast after the failure because the port will enter the Spanning tree forwarding state immediately, bypassing the listening and learning states
- Use “BPDU guard” to enforce STP boundary. This configuration protects from any invalid device connection on the ESXi host facing access switch ports. As mentioned earlier, VDS doesn’t support Spanning Tree protocol and thus doesn’t send any Bridge Protocol Data Unit (BPDU) frames to the switch port. However, if any BPDU is seen on these ESXi host facing access switch ports the BPDU guard feature puts that particular switch port in error-disabled state. The switch port is completely shut down and prevents affecting the Spanning Tree Topology.
The recommendation of enabling “portfast” and “BPDU guard” on the switch ports is valid only when customers connect non-switching/bridging devices to these ports. The switching/bridging devices can be hardware based physical boxes or servers running software based switching/bridging function. Customers should make sure that there is no switching/bridging function enabled on the ESXi hosts that are connected to the physical switch ports.
In the scenario where the ESXi host has a guest VM that is configured to perform bridging function, the VM will generate BPDU frames and send out to the VDS. The VDS then forwards the BPDU frames through the network adapter to the physical switch port. When the switch port configured with “BPDU guard” receives the BPDU frame, the switch disables the port and the VM looses connectivity. To avoid this network failure scenario while running software-bridging function on an ESXI host, customers should disable the “portfast” and “BPDU guard” configuration on the port and run the spanning tree protocol.
In case customers are concerned about the security hacks that can generate BPDU frames, they should make use of the VMware vShield App security product that can block the frames and protect the virtual infrastructures from such layer 2 attacks. Please refer to vShield product documentation for more details on how to secure your vSphere virtual infrastructure.
Link Aggregation setup
Link Aggregation is used to increase throughput and improve resiliency by combining multiple network connections. There are various proprietary solutions in the market along with vendor-independent IEEE 802.3ad (LACP) standard based implementation. All solutions establish a logical channel between the two end points using multiple physical links. In the vSphere virtual infrastructure the two ends of the logical channel are virtual switch (VDS) and physical switch. These two switches have to be configured with link aggregation parameters before the logical channel is established. Currently, VDS supports static link aggregation configuration and does not provide support for dynamic LACP. When customers want to enable link aggregation on a physical switch, they should configure static link aggregation on the physical switch and select IP hash as NIC teaming on the VDS.
When establishing the logical channel with multiple physical links, customers should make sure that the Ethernet network adapter connections from the host are terminated on a single physical switch. However, if customers have deployed clustered physical switch technology then the Ethernet network adapter connections can be terminated on two different physical switches. The clustered physical switch technology is referred by different names by networking vendors. For example, Cisco calls their switch clustering solution as VSS (Virtual Switching System) while Brocade calls it as VCS (Virtual Cluster Switching). Please refer to the networking vendor guidelines and configuration details while deploying switch-clustering technology.
Link State Tracking
Link state tracking is a feature available on Cisco switches to manage the link state of downstream ports (ports connected to Servers) based on the status of upstream ports (ports connected to Aggregation/Core switches). When there is any failure on the upstream links connected to aggregation or core switches, the associated downstream link status goes down. The server connected on the downstream link is then able to detect the failure and re-route the traffic on other working links. This feature thus provides the protection from network failures due to the down upstream ports in non-mesh topologies. Unfortunately, this feature is not available on all vendors’ switches, and even if it is available, it might not be referred to as link state tracking. Customers should talk to the switch vendors to find out if similar feature is supported on their switches.
The Figure 2 below shows the resilient mesh topology on the left and a simple loop free topology on the right. VMware highly recommends deploying the mesh topology shown on the left that provides highly reliable redundant design, and it doesn’t need link state tracking feature. Customers who don’t have the high-end networking expertise and are also limited with number of switch ports might prefer the deployment shown on the right. In this deployment customers don’t have to run the Spanning Tree Protocol because there are no loops in the network design. The downside of this simple design is when there is a failure on the link between the access and aggregation switch. In that failure scenario, the server will continue to send traffic on the same network adapter even when the access layer switch is dropping the traffic at the upstream interface. To avoid this black holing of server traffic, customers can enable link state tracking on the virtual and physical switches and indicate any failure between access and aggregation switch layer to the server through link state information.
Figure 2 Resilient loop and no-loop topologiesVDS has default network failover detection configuration set as “Link status only”. Customers should keep this configuration if they are enabling the link state-tracking feature on physical switches. If link state tracking capability is not available on physical switches, and there are no redundant paths available in the design, then customers can make use of Beacon Probing feature available on VDS. Beacon probing function is a software solution available on virtual switches for detecting link failures upstream from the access layer physical switch to the aggregation/core switches. Beacon probing is most useful with three or more uplinks in a team.
Maximum Transfer Unit (MTU)
Make sure that the Maximum Transfer Unit (MTU) configuration matches across the virtual and physical network switch infrastructure.
After covering the important virtual and physical switch parameters and some recommended guidelines for each, we will take a look at the rack server deployments with multiple 1 Gigabit network adapters as well as two 10 Gigabit network adapters in the next blog entry.