MX250 problems - MX 18.107.2

Flembot
Here to help

MX250 problems - MX 18.107.2

Is anyone using MX250s encountering any issues lately? 

 

our spare failed to connect the the dashboard last week but the primary was fine and there were no issues whatsoever. 

However, since Saturday 9th Jun the entire network has been flapping about. Up and down up and down. Everything

 

To stabilise the situation I've figured out that - by removing the spare entirely from the network both at dashboard level and physically disconnecting the power...

then power cycling the primary

this appears to have stabilised the site allowing clients to connect and operate (so far).

However, I am noticing that the WAN changes over from WAN1 and WAN2 periodically. 

 

I'm wondering if the issue is entirely firmware related albeit that I've stabilised the situation.

I'm hoping it's not just me facing this. 


Think I'll call Cisco and ask them to take the firmware back to the previous version as our WAN uplinks should be okay yet the primary MX keeps failing over between 1 and 2 uplink. 

8 Replies 8
SaheedA
Meraki Employee
Meraki Employee

@Flembot Thanks for bringing this to be community.

I would recommend establishing a baseline of the issue before requesting a firmware downgrade/rollback.

Were you able to connect to the local status page of the spare MX(This should provide info on why the device is not showing up online)? If yes, what info was the MX reporting on its connection tab?

Regarding WAN1 and WAN 2 flapping, were you able to confirm it was not related to an issue upstream? My assumption is there is an issue upstream causing the primary WAN link to go down and failover to the secondary. However getting a capture or data to confirm this would be the best approach.

gitnetwork
Conversationalist

We have the same issue. Seen it on two HA pairs of MX250 after running 18.107.2 for a month.

The CPU of the appliances goes to valies > 85% starting at the time of the issue. Issue resolved after rebooting. Reported issue to Cisco and downgraded one pair to 17.10.5 while leaving the other at 18.107.2.

gitnetwork
Conversationalist

@Flembothave you progressed this with Meraki support? We have a similar issue, and was wondering if the issue still occurs. Are you using AutoVPN, and if so to how many sites?

JF1
Getting noticed

I have an upgrade scheduled for next week Pair of MX250swith Roughly 80 remote sites in total using AutoVPN.

 

I dont want to be upgrading if we are likely to have this same problem?

 

Can anyone issue an update?

gitnetwork
Conversationalist

I see that in the latest stable release it is mentioned


Bug fixes

Corrected a rare issue that could result in excessive device utilization when intrusion detection and prevention was enabled.


Wonder if this matches our case but we don't seem to use IDS but see the symptom of excessive utilizatoin while mx250 would and should be sitting idle

 

Our case with Cisco still open and our device not yet rebooted pending Cisco Meraki investigations

JF1
Getting noticed

Thanks for the response. We use IDS. It would help if in the Meraki release notes they provided additional information on the issue, cause, affected hardware, affected topologies, detailed fix info etc.

 

If you could update the case once you receive feedback from Meraki id be grateful.

Tempted to cancel my upgrades, I really cant risk taking the estate down.

gitnetwork
Conversationalist

It appears a lot of the reported issues were side-effects of the introduction of ipv6 throughout the stack.

Excessive utilization caused by internal tables filling up and causing device instability. Seems all fixed in MX 18.205. We rolled back to MX 16 and no issues.

JF1
Getting noticed

We recently upgraded our MX250 HA pair to 18.107.5 and "touch wood" we haven't had any issues

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels