MX uplink port issues on HA-pairs

MarcelTempelman
Getting noticed

MX uplink port issues on HA-pairs

Hi all,

 

I see several MX firewalls (in HA setup) having problems with their uplink ports:

 

- We have MX85s which dropped their uplink connections. We first heard it could possibly be caused by a Energy Efficient Ethernet but neither devices we connected the MX to supported this. Now we got the advice to use Copper SFPs instead of the normal ethernet ports on the MX (this got me thinking that a static speed/duplex setting also might work, not that I favor such settings....).

 

- A customer with 2 MX 450s (HA) sees a lot VRRP changes related to Uplinks falling away. They are using 10G DAC-cables connected to a MS250 which functions as a WAN-switch.

 

- In the release notes I see that the MX100 also has an uplink issue.

 

- Another customer with MX250s seems to have the same issues

 

Most of the time when the spare is shut down, things stay stable.

 

Most issues seem to arise after an upgrade to 16.x (in case of the 450s issues started when upgrading to the last 14.x version, 16.x seemed to stabilize but the problems returned).

 

What's going on here and why are there so many different models affected suddenly? It's not my point to discuss every issue separately, I was just wondering why there is such a flood of issues with uplinks on (mostly) HA-pairs.

 

With kind regards,

 

Marcel Tempelman

7 Replies 7
cmr
Kind of a big deal
Kind of a big deal

@MarcelTempelman we have HA pairs of MX84s and MX100s running in routed mode using firmware 16.16 and haven't had any noticeable issues, we also have an HA pair of MX250s in concentrator mode using 16.15 and also haven't had an issue there.  The MX250s are using Cisco GLC-T transceivers in the WAN port.  We use Cisco small business unmanaged gigabit switches for the WAN connections on the routed units with the MX250s now connecting to an MS355 stack and were connecting to a Cisco 3850 stack for most of the time.

PhilipDAth
Kind of a big deal
Kind of a big deal

I haven't seen any issues with MS250s or MX68s.

MarcelTempelman
Getting noticed

Thanks for adding the info. At this moment it is still hard to point out a specific situation or hardware type except that it happens with HA-pairs.

 

Update: changing to SFPs on our MX85s did nothing with the dropping uplinks. Advice from Meraki support was upgrading to 16.16.1 which includes a fix for the MX100 but they expect it to work for these as well.

 

In case of the MX450s I had a chat with the customer and they are using a Aruba switch as WAN-switch and I have yet to rule out any spanning-tree causes. The odd thing is that VRRP is using prio 105 when the uplink drops. I haven't found any reference to that status.

 

https://documentation.meraki.com/MX/Networks_and_Routing/Routed_HA_Failover_Behavior

 

 

For those interested. This is the kind of behavior we're seeing :

MarcelTempelman_0-1651480460463.png

Depending on the role the FW is dropping to the VRRP-prio associated with failing uplinks (75 on the primary and 55 on the spare MX). On the WAN-switch we only see the status of the ports change Designated -> Down -> Disabled -> Down -> Designated. This occurs repeatedly and coincides with the VRRP events.

 

I'll keep you updated.

 

GIdenJoe
Kind of a big deal
Kind of a big deal

VRRP is only reacting because your WAN links fail on the MX.  So there is no point in watching VRRP in this case.
The fix for the MX100 AFAIK was directed to the LAN SFP ports of the MX100.  Since the MX100 does not have WAN SFP ports it might be a complete different issue on the MX250/450 device.

MarcelTempelman
Getting noticed

Looking at the VRRP is certainly important because it tells you if the problems are caused on the WAN or the LAN side. Losing uplinks will give you another VRRP status then a losing the connection between the MXs on the LAN-side.

 

Paul_L
Here to help

We have a similar ongoing problem with MX450 pair, had to revert to 15.44 as VRRP started failing randomly & then more constant. Ticket still open.

MarcelTempelman
Getting noticed

MX 16.16.2 promises fixes for some models:

 

Fixed an issue on MX67(C,W), MX68(W,CW), MX75, and MX85 appliances that could cause ports to occasionally disconnect and reconnect (“flap”) when connected to some devices.

 

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels