MX250 WAN1 and WAN2 uplink changes for unknow reason

Cakes
Here to help

MX250 WAN1 and WAN2 uplink changes for unknow reason

My WAN1 and WAN2 uplink keep switching back and forth. For unknown reasons the uplink goes into failover mode and switches from the primary uplink to the secondary uplink. I have been on phone with my ISP monitoring our ONT and there are zero dropped connections. Of course, the amazing Event log that Meraki provides us shows zero detail concerning why the primary uplink status has changed. I know the MX performs failover connectivity tests to determine uplink status and it's possible it's failing one of these tests. But since there is no detail for this event in the so-called event log. How am I able to diagnose that one of these test failures is the possible root cause of the problem?

6 Replies 6
alemabrahao
Kind of a big deal
Kind of a big deal

Is the connection of the links directly on the WANs or do you connect to a switch and then to the MX? Have you tried changing the network cables?

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

I connect to a switch then to the MX. I haven't tried changing the network cables. I have tried rebooting the entire switch.

mwiater
Getting noticed

This is in the release notes for 18.106,   

  • Fixed a rare issue that could result in the WAN interfaces for MX appliances incorrectly transitioning to a down state for a brief period of time.

I've seen this as well.  

 

Other than that, packet capture between the isp and the mx is the next best thing for what you can't see

I have been on the 18.106 firmware for some time now, although I see 18.107 is available now. I'm working with my ISP and provided a packet capture just filtering the ARP. But now I'm thinking of including ICMP, ARP and DNS to get a better view.

mwiater
Getting noticed

I was always suspicious that I wouldn't see everything if the interface flapped, tcpdump stopping when the interface goes down.   It's also worth checking the logs on that switch.

PhilipDAth
Kind of a big deal
Kind of a big deal

Check out this guide on what causes WAN failover:

https://documentation.meraki.com/MX/Firewall_and_Traffic_Shaping/Connection_Monitoring_for_WAN_Failo... 

 

Note that your connection to the ONT could be 100% stable, but if your ISPs DNS is having an issue or your ISP is having upstream connectivity issues, a failover can still occur.

 

I had one case where the ISPs connectivity to Google's 8.8.8.8 DNS server was unreliable and caused WAN failovers.

 

Get notified when there are additional replies to this discussion.