VXLAN

So what is VXLAN and why do we need it?

Well put simply it’s VLAN with an X in the middle 🙂 the X standing for eXtensible. VXLAN was a joint project between Cisco, VMware, Redhat and Citrix which is why it has been so widely adopted, and underpins the majority of SDN offerings.
And as to why we need it, well that’s mainly to address two limitations of using regular VLANs. Scale and Flexibility.

Scale:
As we all know standard 802.1Q VLANs scale to just over 4000 VLAN Ids, and while that number sounds a lot and is fine in most cases, large Service Providers, Enterprises and Multi Tenant environments ,would certainly need more.

VXLAN encapsulates the standard Ethernet frame and adds a header to it including a 24bit VXLAN ID field which increases the number of VLANs from 4096 to 16million logical segments, while only adding approx 50 Bytes of overhead to the frame (udp header)

Flexibility:
In this world of ever increasing workload flexibility and agility we need a way of quickly and safely providing connectivity between Virtual Machines anywhere in the network where we have capacity.
Historically this was done by extending VLANs everywhere that a Virtual Machine may be required. This as we all know comes with a raft of potential issues around Scale, Complexity and resiliency
As the Layer 2 Frame is encapsulated into an IP Packet it can now cross Layer 3 boundaries! This opens up a whole raft of use cases.

These use cases include, but are certainly not limited to:
• Running layer 3 all the way to the edge of your network then mapping your VXLANs over the top (overlay) getting the best of both worlds of a L3 transport but Layer 2 adjacency / reach ability wherever you need it.
• Extend your Layer 2 into any Public/Hosted Cloud allowing you to move VMs in and out of a hosted service as and when you need to. (Cloud Burst)
• Extending a VLAN over a Layer 3 Data Centre Interconnect (DCI) for Disaster Recovery (DR) to allow VM mobility between Data Centres.

Also IP packets make much better use of Port-Channelled links unlike other encapsulation technologies like MAC in MAC.

So how does VXLAN work?

The VXLAN enabled switch (The Nexus 1000v VEM in my example below) learns the VM’s MAC Address, and the assigned VXLAN ID; it then encapsulates the frame according to the port profile the VM is assigned to.
When the VM first comes online the VEM assigns it to a defined Multicast Group, which carries all, Broadcast, Unknown Unicast and multicast traffic (B/U/M). Known Unicasts are sent directly to the correct destination VEM/port.
Although all VMs/Tenants are assigned to the same Multicast group the VXLAN segment IDs are used to only deliver traffic to the same VXLAN thus maintaining and ensuring tenant separation.
The resulting VXLAN “tunnels” terminate at either end on the VXLAN enabled Switches the VM’s/Servers are connected to. These Switches are referred to as Virtual Tunnel End Points (VTEPs)

Figure 1 below shows the VXLAN encapsulation (Wrapper) put around the original Ethernet frame.

Figure 1 VXLAN Encapsulation

VXLAN Packet

The Outer IP’s added by the VEM are for the VTEPs, VTEPs can be a virtual switch residing in a hypervisor like the Nexus 1000v or a logical switch residing in a physical switch.
If you want to “break out” of the VXLAN and have your VM talk to a Bare Metal device or a gateway for routing then a VTEP Gateway is required. This VXLAN gateway has an interface in the VXLAN and an interface in the classical Ethernet VLAN then bridges between the two.
Examples of VXLAN gateways are the Cisco ASR1000v/CSR1000v or the VXLAN Gateway Services Module for the Nexus 1110/1010 Virtual Services Appliance. Some VXLAN enabled physical switches are also capable of providing VXLAN gateway functionality.
As mentioned above VXLAN relies on having an IP Multicast Enabled network between VTEPs.
There are 2 Cisco (non IETF) enhancements which negate the need for an IP Multicast enabled network.
1) Head-end software replication.
The VTEP (Nexus 1000v in my example) sends a copy of the B/U/M Traffic via unicast to all possible VTEPs on which the destination MAC could be located. (works well for smaller deployments)

2) The second solution relies on the control plane of the Nexus 1000V virtual switch, the Virtual Supervisor Module (VSM), to distribute the MAC locations of the VMs to the Nexus 1000V Virtual Ethernet Module (VEM, or the data plane), so that all packets can be sent in unicast mode. While this solution seemingly conflicts with the VXLAN design objective of not relying on a control plane, it provides an optimal solution within Nexus 1000V-based virtual network environments. Compatibility with other VXLAN implementations is maintained through IP Multicast, where required.

VXLAN Configuration example:

Physical Topology

Physical Topology

Logical Topology

VXLAN Logical Topology

First Ensure IP multicast is enabled on the switch and SVI interfaces.

Ip pim sparse-dense-mode (on the L3 interfaces)
Ip pim birdir-enable (recommended as any endpoint could be a sender or receiver)
Ip send-rp-announce Loopback0 scope 16 birdir (sets switch up as an RP)
Ip pim send-rp-discovery Loopback0 scope 16

Verify with “sh ip pim interface” and “sh ip pim rp map

On Cisco Nexus 1000v VSM

Feature Segmentation (enable VXLAN Feature, requires advance license)

Bridge-domain VXLAN5000_TENANT1
Group 239.1.2.3
Segment id 5000

Create the Layer 3 control interface uplink port-profiles for the VEMs

Port-Profile type vethernet Control_Uplink_1001
capability l3control
capability vxlan
vmware port-group
switchport mode access
switchport access vlan 1001
no shutdown
system vlan 1001
state enabled

Port-Profile type vethernet Control_Uplink_1002
capability l3control
capability vxlan
vmware port-group
switchport mode access
switchport access vlan 1002
no shutdown
system vlan 1002
state enabled

Create the Port-Profile the VMs will connect to:

Port-Profile type vethernet VXLAN_5000_Tenant1
switchport mode access
switchport access bridge-domain 5000
vmware port-group
no shut
state enable

Verify on VSM with
Show bridge domain

Verify on Switch with
Sh ip mroute 239.1.2.3

First test with both VM’s on the same host/port-group then vMotion VM2 to ESX2

VXLAN Packet Walk

Let’s take the above example and do a PING from VM1 (MAC1) on ESX01 to VM2 (MAC2) on ESX02

1. Virtual machine VM1 on ESX01 sends an ARP packet with Destination MAC as “FFFFFFFFFFF”

2. VTEP (VEM) on ESX01 encapsulates the Ethernet broadcast packet into a UDP header with Multicast address “239.1.2.3” as the destination IP address and VTEP address “10.200.1.150” as the Source IP address.

3. The physical network delivers the multicast packet to the hosts that joined the multicast group address “239.1.2.3”.

4. The VTEP on ESX02 receives the encapsulated packet. Based on the outer and inner header, it makes an entry in the forwarding table that shows the mapping of the virtual machine MAC address and the VTEP. In this example, the virtual machine MAC1 running on ESX01 is associated with VTEP IP “10.200.1.50”.

5. The VTEP also checks the segment ID or VXLAN logical network ID (5000) in the external header to decide if the packet has to be delivered on the host or not.

6. The packet is de-encapsulated and delivered to the virtual machines connected on that logical network VXLAN 5000.

7. Virtual Machine MAC2 on ESX02 responds to the ARP request by sending a unicast packet with Destination Ethernet MAC address as MAC1.

8. After receiving the unicast packet, the VTEP on Host 2 performs a lookup in the forwarding table and gets a match for the destination MAC address “MAC1”.

9. The VTEP now knows that to deliver the packet to virtual machine MAC1 it has to send it to VTEP with IP address “10.200.1.50”.

10. The VTEP creates unicast packet with destination IP address as “10.200.1.50” and sends it out.

11. The packet is delivered to ESX01

12. The VTEP on Host 1 receives the encapsulated packet. Based on the outer and inner header, it makes an entry in the forwarding table that shows the mapping of the virtual machine MAC address and the VTEP. In this example, the virtual machine MAC2 running on ESX02 is associated with VTEP IP “10.200.2.50”.

13. The VTEP also checks segment ID or VXLAN logical network ID (5000) in the external header to decide if the packet has to be delivered on the host or not.

14. The packet is de-encapsulated and delivered to the virtual machine connected on that logical network VXLAN 5000.