Spurious Uplink failover messages from both MX appliances - ISP says everything's fine on their end

Solved
FR1978
Conversationalist

Spurious Uplink failover messages from both MX appliances - ISP says everything's fine on their end

For about a month now we've been getting periodic uplink failover alerts originating from both our primary appliance and the spare. They're at random times during the day, mostly overnight between midnight and 5 AM. The office is empty during those times so traffic is basically at nothing. A few weeks ago the alerts were infrequent enough to make us believe that it was just a result of ISP maintenance after hours. However, lately it's been frequent and happening throughout the work day. But here's what's weird, nothing actually fails over. There no packet loss, nada and the the failover lasts a minute or less. Every time it's like this. End users at the site don't notice any issues as well, so I don't believe anything is failing over. Here's the troubleshooting we've done so far:

 

  • Called ISP multiple times to determine that there's no issue on their end. They say the circuit has been up and stable for over 3 months.
  • On our initial call with Meraki, it was determined that our MX firmware was out-of-date and so we upgraded to the latest stable version. We saw no problem for about 20 hours and then it started happening again.
  • Second call with Meraki and they determined that the checks were failing at the Meraki cloud portion on UDP port 7351. However, we're not blocking any outgoing traffic in our firewall settings. Could I be missing something here?

I'm not sure where to go from here. I don't think anything is failing over, I think the alerting on the Meraki end might be a little too sensitive and it's generating these alerts.

 

Here's a sample of the failover messages:

 

The primary security appliance in the <CorpNetwork>  network switched to using its primary uplink, configured to be uplink Internet 1, after a period in which the link was unavailable.

There have been a total of 2 failover events detected:

At 05:07 AM EST on Nov 4, the security appliance switched to using Internet 2 as its uplink.
At 05:07 AM EST on Nov 4, the security appliance switched to using Internet 1 as its uplink.

Of course any insight into this would be very much appreciated.

 

***Update***

 

This went away as mysteriously as it came. We swapped over to WAN 2 as the primary over the weekend to see if we would still get the alerts and we received no alerts. Swapped back to WAN 1 and haven't had any issues for over two weeks. So I assume it was an issue on Meraki's end that was quietly fixed.

1 Accepted Solution
Nash
Kind of a big deal

In my experience, yes. Primary uplink failure is extremely sensitive. We have a client that often fluctuates, and we had to disable that alert on them because it was tons of noise. The client has never experienced noticeable side effects.

View solution in original post

6 Replies 6
Nash
Kind of a big deal

In my experience, yes. Primary uplink failure is extremely sensitive. We have a client that often fluctuates, and we had to disable that alert on them because it was tons of noise. The client has never experienced noticeable side effects.

FR1978
Conversationalist

What happens if the link is really down? How do you receive alerts?

Nash
Kind of a big deal

For this really noisy client, we've accepted the risk - but we can do that with them.

 

Otherwise, we're working on alerts using a third-party app called Auvik.

FR1978
Conversationalist

Has Meraki confirmed that this is an issue? 

Nash
Kind of a big deal

Might be a good question for you to ask support.

cmr
Kind of a big deal
Kind of a big deal

We get the same on one (busy) site, so much that I made the other ISP the primary.  The issue still persists, though perhaps a little less frequently.  Having seen this I'll log it with support as both ISPs cannot see an issue and the users don't either.

 

Funniest thing is that they are both MPLS links and the default routes are on the same stack at our main DC and from there they both go through the same internet firewall, even on the same rule, so not sure how one is down when the other is up...!

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels