Meraki Switch Stack - vMotion Issues

Solved
MHGT
Here to help

Meraki Switch Stack - vMotion Issues

All,

I wanted to throw an issue that we are experiencing out there to the community to see if anyone else is experiencing similar issues.  Note, Layer 3 switching not enabled at this time on the stack.

 

Environment:

In our environment, we have a stack of three MS Series switches (2 x MS250-48P and 1 x MS250-24P FW MS10.40 across the board).  Connected to the stack we have a two node vSphere cluster in which each host has two network connections (connected to two different switches in the stack for redundancy) to the switch teamed in an active/active state using "Route based on originating virtual port" as the load balancing policy.  

 

Issue:

When vMotioning a VM from one host to another, connectivity drops to the VM once it moves to the other host.  While investigating, I noted that the MAC forwarding tables are not updating properly in the stack.  I do have the "Notify Switch" option enabled on each of the vSphere hosts.  The only way I can fix the issue at this point is to either reboot the entire stack one switch at a time or vMotion the VM back to the originating host.  This happens to random VM's.

 

When running ESXTOP, I can see the physical NIC that the VM is using on the host which doesn't match up to the port that the switch stack thinks it should be on.

 

Any thoughts would be appreciated.

1 Accepted Solution
MHGT
Here to help

I believe we got to the bottom of the issue.  We tracked down all of the small "5-port non-Meraki switches" that were being used and disconnected them from the network.  Fingers crossed, it's been about a week now without any issues.  vMotioning has been working as expected.  I suspect one of those devices was getting in the way of the MAC updating process.

 

Thanks again for all of your suggestions!

 

 

View solution in original post

8 Replies 8
PhilipDAth
Kind of a big deal
Kind of a big deal

I have a client running a pair of MS250's in a stack with (I think) 5 VMWare hosts connected.  I'm pretty sure they have DRS enabled, so VM's can vmotion at will.  Their physical hosts are connected to both switches.

 

They would have told me if the VMs were loosing connectivity.

 

They are using 10.35 firmware.  The 10.x code had a lot of improvements.  Are you using 10.x firmware?

MHGT
Here to help

@PhilipDAth,

I have had customers running in this configuration before as well without any issues.  That's why I find it a bit perplexing.

 

All switches are running FW 10.40.

calebbaker
Here to help

I have a similar setup and don't have this issue. Do you have the 2 ports going to the hosts aggregated? If so, I believe the Load Balancing option needs to be set to "Route Based on IP Hash".

 

Here are my settings:

2018-11-20 14_23_05-vSphere Web Client.png

MHGT
Here to help

@calebbaker,

They are not aggregated.  We are using "Route Based on Originating Port ID" for our load balancing method.  We don't have a need for using the "Route Based on IP Hash" algorithm as we would be adding more complexity to the environment than required for the workload.

MHGT
Here to help

Just throwing out an update.  Originally I had thought that this was just related to our ESX teamed NIC's.  I updated the iDrac firmware on one of my ESX hosts and after rebooting the iDrac interface I can no longer communicate with it's IP address from my laptop.  I also cannot ping it from Switch 1 and 2 in the stack but can ping it from Switch 3.

PhilipDAth
Kind of a big deal
Kind of a big deal

It sounds to me like something has gone badly wrong with the synchronising of the forwarding tables in the switch stack.  Personally, I would arrange a time when I could power the whole switch stack down and then back up again.

 

Have you definately got something plugged into all the switch stack ports, and connected them so they form a ring?  For example, stack port 1 on each switch is connected to stack port 2 on the next switch, and then the bottom switch stack port 1 connects to the top switch stack port 2?

MHGT
Here to help

That's certainly what it's looking like. The forwarding tables aren't updating properly both with VM's and now standalone devices. Rebooting the entire switch stack fixes the issue; however, eventually the issue comes back. I should mention that I have an active case open with support and I am hoping to hear back tomorrow with an update.

I double checked our stacking cable config and they do indeed form a complete ring (1 -> 2 the whole way around).
MHGT
Here to help

I believe we got to the bottom of the issue.  We tracked down all of the small "5-port non-Meraki switches" that were being used and disconnected them from the network.  Fingers crossed, it's been about a week now without any issues.  vMotioning has been working as expected.  I suspect one of those devices was getting in the way of the MAC updating process.

 

Thanks again for all of your suggestions!

 

 

Get notified when there are additional replies to this discussion.