VPN Failover Delay in Dual MX68 VIP HA Setup When LAN Disconnects

Solved
Grawill
Conversationalist

VPN Failover Delay in Dual MX68 VIP HA Setup When LAN Disconnects

Hi Team,

I'm testing a dual MX68 warm spare deployment and encountering an issue with VPN failover behavior. Please see the attached network diagram for reference.

Setup:

  • Both MX appliances have WAN1 connected to an L2 switch and are configured using a WAN VIP.

  • Each MX LAN port connects to a Cisco 3650 switch (ports configured as trunks).

  • Clients are behind the 3650.

Failover Behavior Observed:

  • When the primary MX’s WAN connection is disconnected:

    • Internet fails over in ~1 second

    • VPN tunnels re-establish within ~30 seconds (expected)

  • When the primary MX’s LAN connection to the 3650 is disconnected:

    • Internet fails over in ~1 second

    • VPN failover takes 60–90 minutes, or doesn’t complete without intervention

 

I opened a support ticket and was told I needed to submit a feature request to support a LAN side failure.  They were unable to provide a recommendation on design or config change to get a fully redundant system.

 

Question:

Is this behavior expected when the LAN interface drops? Or is there a potential configuration issue preventing the standby MX from establishing VPN tunnels promptly after LAN-only failover?

Thanks in advance for your help!

 

Meraki-Example-Network.png

1 Accepted Solution
Mloraditch
Kind of a big deal
Kind of a big deal

Yes this is expected, the MXs know which one is the master via the LAN side. When there is no lan connection between them they end up in a split brain and don't know who is in charge.

Years ago a cable connecting the MXs to each other was recommended to account for this, but due to STP issues it no longer is. It can still work if you are ready to deal with the STP situation. Basically one LAN uplink port will always be in blocking state.

I would submit the feature request as it really is something that should work.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.

View solution in original post

4 Replies 4
Mloraditch
Kind of a big deal
Kind of a big deal

Yes this is expected, the MXs know which one is the master via the LAN side. When there is no lan connection between them they end up in a split brain and don't know who is in charge.

Years ago a cable connecting the MXs to each other was recommended to account for this, but due to STP issues it no longer is. It can still work if you are ready to deal with the STP situation. Basically one LAN uplink port will always be in blocking state.

I would submit the feature request as it really is something that should work.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
alemabrahao
Kind of a big deal
Kind of a big deal

When the primary MX's LAN connection drops, the failover behavior can be influenced by several factors, including VRRP heartbeats and the configuration of the LAN ports. Typically, VRRP heartbeats are used to monitor the health of the primary MX, and if these heartbeats are not received by the standby MX, it should trigger a failover.

Some recomendations:

Ensure VRRP monitoring is enabled and functioning correctly. This will help in detecting LAN failures more promptly.


If possible, establish a direct link between the primary and standby MXs to improve failover detection and response times.

 

MX Warm Spare - High-Availability Pair - Cisco Meraki Documentation

Routed HA Failover Behavior - Cisco Meraki Documentation

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
PhilipDAth
Kind of a big deal
Kind of a big deal

We really need Port-Channel support in the MX .... just saying.

alemabrahao
Kind of a big deal
Kind of a big deal

I totally agree, it's high time they gave us this possibility.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
Get notified when there are additional replies to this discussion.