Switch stack issue when upgrade from 10.45 to 11.22 - Mac flap issue

andydaniel
Here to help

Switch stack issue when upgrade from 10.45 to 11.22 - Mac flap issue

Hi, is there anyone here facing the same issue on the stack switches, after upgrade from 10.45 to 11.22 you will facing the mac issue on the stack switches and TAC suggested to reboot the master switch as a workaround. but in this case in our environment is not working well. without traffic all working good, but when user come and load become heavy the issue started again. ad when rolling back to 10.45 all issue gone. anybody have suggestions or facing the same issue, let me know how to tackle this instead break the stack which is an extra effort for L3 switch. Thank you

20 Replies 20
Brons2
Building a reputation

The switch MAC is flapping, or the client MAC?  Which switch model(s)?

 

Yet another post that gives me pause about putting in 11.22.

andydaniel
Here to help

not siwtch mac, its look like a bug causing the cam table not sync between master and member of stack on MS425-32 . they mentioned issues with the CAM table that map to the software problem .

PhilipDAth
Kind of a big deal
Kind of a big deal

>cam table not sync between master and member of stack

 

I've seen that before, but not in 11.x switch firmware.

markusg
Here to help

Had major problems with a four switch MS350 stack after upgrading to 11.22, though only after making downstream and/or upstream changes.  Played intermittment hell with our two (active/passive) uplinks for routed traffic, though curiously not with L2 traffic.  There appears to be a known issue with MAC tables being correctly propogated across the stack, though I have no idea how general the issue is.  Our issue worked around by moving the uplinks to the master.  Fix in the works.  Make sure the uplink(s) are on the master.  Log a case if you have any doubts.

markusg
Here to help

Just to qualify my last comment. Our experience suggests that there is a known issue with MAC address tables across the stack. However, not all topologies will necessary be sensitive to this issue and we haven't seen root cause from support in black and white yet.  You may be experiencing a different issue.  Go back to support.

PhilipDAth
Kind of a big deal
Kind of a big deal

Unfortunately if you experience an issue with the mac tables being synchronised across a switch stack (not very common these days) you need to reboot the whole stack.  Usually everything comes right after that.

 

11.23 fixed this related ARP bug:

ARP entry on L3 switch expires despite still being in use -- causing brief packet loss until re-ARP occurs

11.24 fixed another related ARP bug:

L3 interfaces on switch stacks can send ARP requests using wrong sender MAC

benny
Getting noticed

+1 seeing similar issues with MAC tables on our MS350 stack running 11.22

 

Support are currently investigating the issue. Seems to be only affecting our voice traffic whereby handsets are dropping out of the MAC table. 

 

Will wait to see what support advises, otherwise perhaps a downgrade back to 10.35

 

Regards,

Ben

andydaniel
Here to help

in our case, all Video conference device got affected, and half of wifi user also got affected

markusg
Here to help

PhilipDAth thanks for the info.  Curiously, in our case, I could see ARP requests and replies to the default router from the stack at regular intervals yet the entry never appeared in the MAC table.  Our routing problem seemed obvious, we had a default route entry in our routing table although no entry in the MAC table to enable forwarding.  We also had other intermittent forwarding issues with client/server relationships across the stack - one almost completely killing public DNS lookups for a subnet.

 

We resisted CTRL-ALT-DEL to help find root cause and worked around each issue as it arose.  After a few calls to support and plenty of troubleshooting thankfully the issue revealed itself.  Lots of pain although we got there in the end.

Brons2
Building a reputation

Markusg, what was the ultimate fix for you in the end?  Something later in the 11.2x branch?

 

I have a 425-16 stack acting as the core internal router in my network.  I can't be having these kinds of problems.

 

Any new features that come in the 11.x firmware probably wouldn't be applicable to us anyway, I have a number of MS42P switches that have a more limited feature set.

 

But in short, 10.45 works fine.  I'm sticking with it until I'm sure these issues in the 11.x have been ironed out.

markusg
Here to help

No fix as yet. Workaround in place, uplinks on stack master only, and stable since the change was made, and I've been making changes up and downstream without issues for so far - the apparent trigger for MAC table problems.

Agreed, lots of pain that can be done without. Sitting tight sounds like a wise plan.

Brons2
Building a reputation

Getting back to this,

 

Some people are saying the beta 11.25 works better.  Has anyone who's had the MAC flap issue tried 11.25?

andydaniel
Here to help

@Brons2 for my environment to apply  beta version is not possible , so the other workaround as per Tac is to break the stack switches. Anyone here have the good way  how to break l3 stack switch? Thanks

markusg
Here to help

@Brons2 we've not had word of a fix yet, and we have chased.  We're sitting tight until we've been advised that root cause is known, and fix written.  Too much disruption last time to take any risks.

andydaniel
Here to help

@Brons2 I just got an update from the Meraki team, they now can reproduce the issue and might in next release or another month possibly they will have the fix for this. will see and wait. 

redsector
Head in the Cloud

Is it fixed with 11.28 ?

andydaniel
Here to help

@redsector not in 11.28
chuyendang
Getting noticed

I got this from support: the firmware 11.28 is reported to fix this issue.

However, I still hold the update as we got too many bugs with Meraki firmware recently

TerryMC
Here to help

I believe we're having this issue and its causing our Mitel phones to randomly loose connection in two of our buildings. Packet captures at all points shows the traffic gets lost in the stack somewhere. It makes it to the Meraki up link, but never makes it to the phone port, so the phone finally resets the connection. I see in 11.30 known issues it still list "ARP entry on L3 switch can expire despite still being in use (predates MS 10.x)". That makes me think it has not been fixed, but maybe I"m not reading it right. We had Meraki support update the firmware from 11.22 to 11.29 on the problem stack only, but it didn't make a difference. 

 

Has anyone found a fix? I tried moving the switch stacks main up-link to the master switch, but didn't help. 

Brons2
Building a reputation

Predates 10.x?  I never had the issue until going to 11.x.  I'm still holding at 10.45 and it's working great in my environment, including on the stack running L3.  However, I think at some point there will be a cost in staying on 10.45  in terms of losing access to new features.

 

One thing I've considered is moving L3 to the firewall - vendor P likes to tell their customers "we can do that".

 

Or just buying a core switch stack from another vendor, and not catalysts either.

 

I don't understand why they can't get it together on this issue.  It's been at least 5 months since this thread started.  This issue should have been fixed in days or even hours.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels