Hi , we have got a customer with a simple network redundant WAN design:
Routers are connected to MX WANs:
We got a lot of logs regarding primary uplink status change and ethernet port carrier change (on WANs port 3/4).
WAN1 is set as primary WAN, WAN2 is set as failover.
No fails are detected on main routers.
What is happening? How can i understand this situation?
This KB is clear: https://documentation.meraki.com/MX/Monitoring_and_Reporting/Primary_Uplink_Status_and_Ethernet_Port...
RSTP is enabled on both routers and downstream switches.
Additional note: the VRRP crossover cable is made by 2 downstream switches in Fully Redundant design: https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair
Thanks for help.
Maybe it's flapped and they're both trying to take over as master. You can check Warm Spare Design on this link:
https://www.willette.works/mx-warm-spare/
We are using Fully Redundant (Multiple Switches) design: https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair
The crossover VRRP link is made by switches, not by MXs
This is a piece of log:
Is STP enabled on switches?
yes
The MX Lan ports are configured as a Trunk or Access? If it is configured as a trunk, try to change Native Vlan to (drop untagged traffic).
Nice idea, let me check.
Access - Untagged VLAN1 management on both sides
Ethernet port carrier change means the MX thinks the Ethernet ports are physically going down. It could be a cable issue.
It could be.. but on 2 different port? Port3 and port4?
We have a similar issue on our MX75, although we don't have HA. The primary WAN link drops with "Ethernet port carrier change". We've raised this with Meraki and our ISP. We placed a managed unconfigured switch on the WAN side of the MX between it, and the ISP router. We see drops for the MX side of the link.
Meraki have said it is a faulty unit and they are going to replace it for us.
May not be germane to your issue but I figure I'd mention it:
On a client's MX84 I was seeing frequent flaps of not only both uplinks but also the active LAN ports. Turns out the AnyConnect service was getting hammered by malicious actors and spiking the CPU to 100%, which caused the flap. Moving AnyConnect to a different port solved the problem.
Sure? This customer does not have an AnyConnect license and does not use it.
Summary reports says that device utilization is under 20% in the last week and CPU data are insufficient.
Still having uplink down/up for 20 seconds: I'm just thinking the problem is related to STP , I could remove the closed loop cable on switch for some days.
I had to get the CPU utilization figure from support. In my case, I had all active ports flapping, not just the WAN links, so if you are only seeing it on the WAN side I'd assume the issue isn't the same. Just figured I'd mention it.