Help to understand uplink status change logs

dade80vr
Getting noticed

Help to understand uplink status change logs

Hi , we have got a customer with a simple network redundant WAN design:

 

  • 2 router
  • 2 MX85 (master and slave) with shared public ip

 

Routers are connected to MX WANs:

  • R1 to MX1 WAN1
  • R1 to MX2 WAN1
  • R2 to MX1 WAN2
  • R2 to MX2 WAN2

 

We got a lot of logs regarding primary uplink status change and ethernet port carrier change (on WANs port 3/4).

WAN1 is set as primary WAN, WAN2 is set as failover.

 

  • primary uplink status change events: about 20s from WAN2 to WAN1 (WAN1 fails to WAN2 and back)
  • ethernet port carrier change events: 1s from false to true

No fails are detected on main routers.

What is happening? How can i understand this situation?

 

This KB is clear: https://documentation.meraki.com/MX/Monitoring_and_Reporting/Primary_Uplink_Status_and_Ethernet_Port...

 

RSTP is enabled on both routers and downstream switches.

Additional note: the VRRP crossover cable is made by 2 downstream switches in Fully Redundant design: https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair

 

Thanks for help.

13 Replies 13
alemabrahao
Kind of a big deal
Kind of a big deal

Maybe it's flapped and they're both trying to take over as master. You can check Warm Spare Design on this link:

 

https://www.willette.works/mx-warm-spare/

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
dade80vr
Getting noticed

We are using Fully Redundant (Multiple Switches) design: https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair

The crossover VRRP link is made by switches, not by MXs

 

This is a piece of log:

 

ms uplinks.png

alemabrahao
Kind of a big deal
Kind of a big deal

Is STP enabled on switches? 

alemabrahao_0-1648478311705.png

 

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
dade80vr
Getting noticed

yes

alemabrahao
Kind of a big deal
Kind of a big deal

The MX Lan ports are configured as a Trunk or Access?  If it is configured as a trunk, try to change Native Vlan to (drop untagged traffic).

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
dade80vr
Getting noticed

Nice idea, let me check. 

dade80vr
Getting noticed

Access - Untagged VLAN1 management on both sides

PhilipDAth
Kind of a big deal
Kind of a big deal

Ethernet port carrier change means the MX thinks the Ethernet ports are physically going down.  It could be a cable issue.

dade80vr
Getting noticed

It could be.. but on 2 different port? Port3 and port4?

JonP
Getting noticed

We have a similar issue on our MX75, although we don't have HA. The primary WAN link drops with "Ethernet port carrier change". We've raised this with Meraki and our ISP. We placed a managed unconfigured switch on the WAN side of the MX between it, and the ISP router. We see drops for the MX side of the link. 

 

Meraki have said it is a faulty unit and they are going to replace it for us.

OVERKILL
Building a reputation

May not be germane to your issue but I figure I'd mention it: 

On a client's MX84 I was seeing frequent flaps of not only both uplinks but also the active LAN ports. Turns out the AnyConnect service was getting hammered by malicious actors and spiking the CPU to 100%, which caused the flap. Moving AnyConnect to a different port solved the problem. 

dade80vr
Getting noticed

Sure? This customer does not have an AnyConnect license and does not use it.

Summary reports says that device utilization is under 20% in the last week and CPU data are insufficient.

 

Still having uplink down/up for 20 seconds: I'm just thinking the problem is related to STP , I could remove the closed loop cable on switch for some days.

OVERKILL
Building a reputation

I had to get the CPU utilization figure from support. In my case, I had all active ports flapping, not just the WAN links, so if you are only seeing it on the WAN side I'd assume the issue isn't the same. Just figured I'd mention it.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels