Failover issue MX68

househed
Here to help

Failover issue MX68

Hi,

 

I've a site with 2 x MX68's running MX 18.107.5, with one WAN link in each and both connected to a Cisco switch doing layer 2 downstream, as per the diagram below.

 

Switch 1 went down, but the MX68's didnt not failover. Pretty sure it did in the past with MX 17.x.x.

 

Any ideas why it would do that?

 

Thanks.recommended_HA_design_switch_stack.png

11 Replies 11
Brash
Kind of a big deal
Kind of a big deal

According to the documentation, a failover should occur in either of the following scenarios:

 

WAN failover: WAN monitoring is performed using the same internet connectivity tests that are used for uplink failover. For more data on these checks, see the Connection Monitoring for WAN Failover article. If the primary appliance does not have a valid internet connection based on these tests, it will stop sending VRRP heartbeats, which will result in a failover. When uplink connectivity on the original primary appliance is restored and the warm spare begins receiving VRRP heartbeats again, it will relinquish the active role back to the primary appliance.

LAN failover: The two appliances share health information over the network via the VRRP protocol. These VRRP heartbeats occur at layer two and are performed on all configured VLANs. If no advertisements reach the spare on any VLAN, it will trigger a failover. When the warm spare begins receiving VRRP heartbeats again, it will relinquish the active role back to the primary appliance.

 

Assuming that the primary MX68 lost LAN connectivity when switch 1 went down, then yes a failover is expected.

 

Is the above diagram an exact match of your topology?

How long did the MX lost LAN connectivity for?

househed
Here to help

Thats what I thought too.

 

The diagram is exactly the same as the topology.

 

The switch was down for hours.

When I go out to replace switch 1, I'll do more testing and see what happens.

Ryan_Miles
Meraki Employee
Meraki Employee

Do the MXs have a direct link between them? If yes, when switch 1 was down was traffic still flowing from switch 1 or 2 > MX 2 > MX 1 > internet?

Ryan

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
househed
Here to help

No there is no direct link between them. I was under the impression that its not necessary any more?

 

I didnt have enough time to test more as I had to get the site up and running again.

 

Will do more testing when I go out for switch 1.

alemabrahao
Kind of a big deal
Kind of a big deal

Is the LAN port of each MX configured as trunk or access?

 

If it's trunk, try disabling the native VLAN, I've had problems with this in the past.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
househed
Here to help

They're all trunks ports, no native VLAN set.

 

I'll be doing more testing this Thursday to see what I can find.

househed
Here to help

An update:

 

Tried again today, as I replaced switch 1 in the stack.

 

Both Meraki appliances were set as Master and wasnt able to connect back to our datacentre.

 

Internet traffic was going out through the secondary appliance.

 

Any way to get the primary to go into Passive mode?

alemabrahao
Kind of a big deal
Kind of a big deal

In this design that you are using, it is recommended that the port be and access mode.
 
So what I suggest is to make a direct connection to each MX. You can create any VLAN (999 for example) and define any IP /30 for both MXs (Ex. MX1 169.168.0.1 and MX2 169.168.0.2), configure a port on each MX in that VLAN and make a direct connection like the image below in red.

 

alemabrahao_0-1699522091356.png

 

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
alemabrahao
Kind of a big deal
Kind of a big deal

Something like this.

 

alemabrahao_1-1699522547099.png

 

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
househed
Here to help

Thanks for that.

 

I've seen this design and was under the impression that Meraki dont recommend cabling them like that anymore.

 

But I'll give it a try in a lab.

JGill
Building a reputation

We ran into some funky spanning tree issues during upgrades with the direct VRRP heartbeat cable during deployments.  Pull the cable no issues,  your mileage may vary, but FYI.  We pulled all the heartbeat cables a few years ago. Bulk of our 500+ MX's are MX84's in a HA configuration, each MX duel cross connected to the switch stack so that no single switch or MX failure should take a location offline.   Meraki Switch stack with stacking cables. 

 

Primary MX

WAN1 - Carrier 1

WAN2 - Carrier 2

Port 9 -> Sw1 port 47

Port 10 -> Sw3 -> port 47

 

Spare MX

WAN1 - Carrier 1

WAN2 - Carrier 2

Port 9 -> Sw1 port 48

Port 10 -> Sw3 -> port 48

 

Originally had a heat beat cable on MX <-> MX on port 3, since pulled.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels