Hello community,
We've been having problems for a few months now, where randomly our primary internet connection will failover to the backup. After much investigation, troubleshooting, network redesign and even a replacement of the MX the problem persists.
Prior to Wednesday, the internet would failover to WAN2 at random intervals, for a period of 1-2 seconds, and then switch back. This would have the effect of dropping every VoIP call in progress each time it fails over, and back again. The realtime utilisation graph in the Meraki dashboard is pretty much useless as by the time we get into it, the event has already gone off the left edge of the graph, and we only see current traffic, however on a couple of occasions we've seen high utilisation of the link immediately before a failover.
We logged cases with both Meraki and our ISP. Our ISP's router logs show that their LAN interface which connects to our firewall drops, but their WAN (internet) link remains up and available. To find out if the router or the MX was faulty, we installed a switch between them so we can log which port drops. We saw the MX's WAN1 port would go off during the outage. Meraki took this information, and replaced our MX.
The replacement was installed on Wednesday, but the problem has continued - mostly during the night:
2 hours Thursday - midnight to 2am (both circuits)
4 seconds Thursday - 09:46:03-09:46:07 (primary)
We have managed to get PRTG up and running to monitor the interfaces, and interestingly this is showing zero downtime on either interface for the same periods.
Could this be a false positive? Could the cloud dashboard be recording loss of connectivity when in actual fact it is fine?
These "new" outages on the replacement MX have so far only happened overnight - so I am hoping that this is just the cloud being stupid.
Anyone have any thoughts?