Hello all!
I've been having problems with LACP since update 15.21.1, I was hoping that the update to 16.9 was going to help, but it's made the situation substantially worse. It used to be that only Cross Stack LACP was glitchy, but now it seems to be even LACP from one single MS250/350 directly to another, no stack involved.
Here's what I'm doing for most buildings.
I have a stack of two MS425's at the core, this stack has one 10G Single Mode Fiber port on each switch configured for a building in an AGG group. (Let's say port 3 in each switch is in the AGG group of two ports)
I have a single access layer switch, call it a MS350 that has two 10G Ports in an AGG Group, and two single mode fibers connect the switches.
Ports on both sides have RSTP enabled, both agg groups have "Loop Gaurd" enabled. Here's an example, keep in mind one port is disabled because of this issue, that's why it's only running at 10G
What I'm finding is that Spanning tree will randomly detect these links as a loop and disable them. Sometimes unplugging the fiber and plugging it back in again brings it back for days, or months, sometimes it detects the loop again pretty fast and goes back offline 10 min later. The real issue is the intermittent traffic with LACP enabled though. I will have some ports that cant communicate with certain IP's, but another port will communicate just fine, then 10 min later the issue is gone. You can always fix any of these issues by unplugging one of the fibers or forcing one of the ports in the group to disabled status.
I've tried to contact Meraki support on this, and I struggle to replicate the issue since it's so random. If I do enable LACP though, I will start getting phone calls about the network being down/sites being down/glitchy behavior pretty quickly.
Has anyone else experienced this? Does anyone else have LACP issues?
Thank you,
James
Maybe you have already done this but I'd be inclined to check the root bridge priority of all switches as the 1st task to make sure the MS425s reflect as the RSTP ROOT being the core switch as that could be part of the issue.
Switching > Configure > Switch settings
Configuring Spanning Tree on Meraki Switches (MS)
I tend to use root guard at the MS425 downlink and STP guard set to disabled on the lower priority MS250s or MS350s uplinks.
Correction (meant to say): nd STP guard set to disabled on the higher priority MS250s or MS350s uplinks.
Thanks for jumping in here RWelch! On the first issue, I have RSTP enabled globally, with the Core Stack (My 2 MS425-32's) set as priority 0, everything else falls under the "Default" category and gets a priority of 32768. Just to confirm nothing is messed up, I checked one of the switches in the stack and it says that it's stack is the RSTP Root.
On the second point,... I don't have it configured that way. I have Loop Guard configured on every link between the core and the access layer, and I have BPDU Guard configured on every user access port.
Will having loop guard set on both sides of an LACP link cause an issue? Or is this just how you prefer to do it? Will Loop Guard detect a LACP link like a unidirectional link? Or something silly like that?
Thank you,
James
Same document referenced above:
The default priority for all Meraki switches is 32768.
It is recommended that you set the priority of your desired root bridge to 4096 to ensure its election. The root bridge should be a switch in the center of the network, near high traffic sources (such as servers), to optimize traffic flow across the network. Using priority 0 is also acceptable for the root, but leaves no room for modification when replacing a core switch in production or modifying behavior temporarily.
It is best practices to set a layered approach to the STP priorities in a network. For instance, if there is a clear Core <> Distribution <> Access Layer, priorities should be Core (4096), Distribution (16384), and Access (61440).
At no point in a production network should you leave the any switch at its default configurations.
I set my MS425s at 4096 and everything else at 61440.
If you were to type loopguard in the community search field you will see several other posts that have issues with it, some have shared their thoughts/feedback on the use of loopguard.
The Meraki Best Practice Designs and Deployment Guides that I tend to follow or mirror in my setups/configurations show core switch downlinks with rootguard.
Meraki Campus LAN; Planning, Design Guidelines and Best Practices
Link Aggregation and Load Balancing
These might help you make more informed decisions (for your consideration).
Hi James,
I feel your pain—LACP issues can be incredibly frustrating, especially after updates that were supposed to improve things! It sounds like you’ve got a pretty complex setup with the MS425s and MS350, which can definitely introduce more variables.
I've also run into similar issues with LACP and spanning tree configurations in the past. One thing that helped was double-checking all configurations on both ends of the links to ensure they match up perfectly. Sometimes the smallest mismatch can lead to these looping problems.
Have you considered temporarily disabling Loop Guard to see if that changes the behavior? It might help clarify if that's where the issue lies. Additionally, keeping a close eye on the logs for any STP messages might give you more insight into why it’s detecting loops.
I hope you find a solution soon—these intermittent issues are the worst! If I come across anything else that might help, I’ll let you know. Good luck!