Hot standby MX64 LAN side failure not working as expected

Solved
Pugmiester
Building a reputation

Hot standby MX64 LAN side failure not working as expected

Hi all,

 

I'm testing an install before we make it a blueprint for some of our smaller EU offices and I have an odd situation I can't seem to see a way to fix.

 

I have a pair of MX64's in Warm spare mode. They're connected with a dedicated patch cable for HA and work as expected, the primary fails, the spare takes over. My issue is the LAN side. Each MX connects to it's own Meraki MS220-8P (We'll use larger switches on site) and then the two switches are linked by a trunk. All is happy until the switch the primary MX connects to fails (or I pull the LAN cable).

 

If the LAN side of the Primary MX fails, it doesn't see this as a failure and therefore pass the primary role to the MX that still has LAN connectivity which then leaves all of the remaining clients on the live switch with no internet connectivity.

 

Am I missing something really obvious?

1 Accepted Solution
Pugmiester
Building a reputation

Both MX's are currently running on 13.33.

 

I have a dedicated VRRP link between the boxes, not across the LAN, so I was expecting the primary to still be able to pass control to the secondary having detected a loss of its local LAN connection but that doesn't seem to happen.

 

In the meantime, I've added a pair of additional patch cables from each MX to the opposing switch and let Spanning Tree sort out what's what which seems to have done the trick. That's a solution I can work with.

View solution in original post

4 Replies 4
MacuserJim
A model citizen

That sounds a little strange. In my experience if the connection between the two MX appliances fails then both MX's will act as a primary to provide internet access to both sets of now separated clients. The MX appliances talk to each-other directly, and it's this that helps the spare determine if it should assume the role of master (when it no longer gets VRRP responses from the default master). Is there any way that the spare MX can see the primary MX even after the LAN on the primary MX "fails"? This would prevent the spare from assuming the role of master and if that is the only route out for some clients I could see how it might prevent them from getting out to the WAN.

 

https://documentation.meraki.com/MS/Layer_3_Switching/MS_Warm_Spare_(VRRP)_Overview

 

Also I like to ask what firmware you are running as this might just be a bug in the firmware that someone may know about.

 

 

Pugmiester
Building a reputation

Both MX's are currently running on 13.33.

 

I have a dedicated VRRP link between the boxes, not across the LAN, so I was expecting the primary to still be able to pass control to the secondary having detected a loss of its local LAN connection but that doesn't seem to happen.

 

In the meantime, I've added a pair of additional patch cables from each MX to the opposing switch and let Spanning Tree sort out what's what which seems to have done the trick. That's a solution I can work with.

jdsilva
Kind of a big deal

Hey @Pugmiester,

 

The first thing you should do is remove the dedicated cable between the MXes, That thing has been proven to cause issues with Warm Spare and Meraki had removed it from their best practices.

 

https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair#Recomme...

 

Once you're more in line with one of those topologies then next would be to check your STP Root is set correctly so that when you fail over all ports facing the Secondary aren't blocked.

Pugmiester
Building a reputation

Hi @jdsilva

 

Thanks for the pointer. I hadn't realised the best practice guide had been updated. I've stripped out the HA dedicated patch cable and together with dual connections from each MX to each switch, everything is looking rock solid.

 

I think we have a winner.

Get notified when there are additional replies to this discussion.