I see several MX firewalls (in HA setup) having problems with their uplink ports:
- We have MX85s which dropped their uplink connections. We first heard it could possibly be caused by a Energy Efficient Ethernet but neither devices we connected the MX to supported this. Now we got the advice to use Copper SFPs instead of the normal ethernet ports on the MX (this got me thinking that a static speed/duplex setting also might work, not that I favor such settings....).
- A customer with 2 MX 450s (HA) sees a lot VRRP changes related to Uplinks falling away. They are using 10G DAC-cables connected to a MS250 which functions as a WAN-switch.
- In the release notes I see that the MX100 also has an uplink issue.
- Another customer with MX250s seems to have the same issues
Most of the time when the spare is shut down, things stay stable.
Most issues seem to arise after an upgrade to 16.x (in case of the 450s issues started when upgrading to the last 14.x version, 16.x seemed to stabilize but the problems returned).
What's going on here and why are there so many different models affected suddenly? It's not my point to discuss every issue separately, I was just wondering why there is such a flood of issues with uplinks on (mostly) HA-pairs.
@MarcelTempelman we have HA pairs of MX84s and MX100s running in routed mode using firmware 16.16 and haven't had any noticeable issues, we also have an HA pair of MX250s in concentrator mode using 16.15 and also haven't had an issue there. The MX250s are using Cisco GLC-T transceivers in the WAN port. We use Cisco small business unmanaged gigabit switches for the WAN connections on the routed units with the MX250s now connecting to an MS355 stack and were connecting to a Cisco 3850 stack for most of the time.
Thanks for adding the info. At this moment it is still hard to point out a specific situation or hardware type except that it happens with HA-pairs.
Update: changing to SFPs on our MX85s did nothing with the dropping uplinks. Advice from Meraki support was upgrading to 16.16.1 which includes a fix for the MX100 but they expect it to work for these as well.
In case of the MX450s I had a chat with the customer and they are using a Aruba switch as WAN-switch and I have yet to rule out any spanning-tree causes. The odd thing is that VRRP is using prio 105 when the uplink drops. I haven't found any reference to that status.
For those interested. This is the kind of behavior we're seeing :
Depending on the role the FW is dropping to the VRRP-prio associated with failing uplinks (75 on the primary and 55 on the spare MX). On the WAN-switch we only see the status of the ports change Designated -> Down -> Disabled -> Down -> Designated. This occurs repeatedly and coincides with the VRRP events.
VRRP is only reacting because your WAN links fail on the MX. So there is no point in watching VRRP in this case. The fix for the MX100 AFAIK was directed to the LAN SFP ports of the MX100. Since the MX100 does not have WAN SFP ports it might be a complete different issue on the MX250/450 device.
Looking at the VRRP is certainly important because it tells you if the problems are caused on the WAN or the LAN side. Losing uplinks will give you another VRRP status then a losing the connection between the MXs on the LAN-side.