Thanks in advance for any ideas - have exhausted paths with Meraki and VZ support teams:
MX85 - verizon business circuit WAN 1 / verizon residential circuit WAN 2
Daily, primary WAN 1 is failing over to WAN 2, but then reverts back instantly to WAN 1
Event log show failure to WAN 2 and then it logs the move back to WAN 1 seconds later
Fortunately it is happening so quickly user connectivity really impacted
Does not happen at the same time of day each day, but always looks the same
Spoke with VZ support multiple times - they see nothing that would signal failure on either circuit
Changed DNS IP to see if that would help, did not
Upgraded MX firmware to latest version - no impact
Switched WAN 1 to VZ residential and WAN 2 VZ business - same thing happened with circuits reversed which suggests it was not an issue on the VZ business line
At a loss and appreciate any ideas
You last point seems like strong evidence the trouble is on the Meraki side if the trouble stays with WAN port and does not follow the ISP connection. Did you make that clear to Meraki support? When you swapped circuits did you also swap the cable between the MX and ISP handoff? That would be the only other variable there that comes to mind.
Thank you and yes - agree on the issue pointing inward - we did swap cables as well trying to eliminate any potential L1 issues
Which firmware are you on? Latest meaning stable, RC, or beta? And, what does WAN 1 physically connect to? Is it set to default auto speed/duplex? If so, have you tried manually setting it?
Latest stable release but will definitely check the port settings, have it physically connected through MS first and will check all the ports - thank you
Are you using the MX SFP or copper WAN ports?
Both connected to MX SFP ports
You may also want to review this: https://documentation.meraki.com/MX/Firewall_and_Traffic_Shaping/Connection_Monitoring_for_WAN_Failo...
I don't like these tests, in particular using public DNS servers as internet probes because everyone should know by now they are not meant for that and often don't respond to ICMP/pings, but otherwise are 100% fine..
Agree and leaning toward the test basically providing a false positive somehow that is triggering the failover, tried multiple DNS services but did not have an impact