MS130 switches causing UDLD errors

Solved
redsector
Head in the Cloud

MS130 switches causing UDLD errors

I have a few MS130-8X switches and they are causing UDLD-errors at the port of the connected switch. Anybody here with the same issue? The uplink goes to an MS120 switch. The MS120 reports UDLD error on the port where the MS130 is connected.

1 Accepted Solution
redsector
Head in the Cloud

I did updates to MS17.1.2 . UDLD error is gone now.

View solution in original post

44 Replies 44
cmr
Kind of a big deal
Kind of a big deal

Is this when using the SFP ports and a fibre?  We have a MS130-12X connected to an MS355-48X, but over copper.  That combination doesn't have an issue.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
redsector
Head in the Cloud

Same with copper and with fibre

redsector
Head in the Cloud

connected by copper.

I did open an case for this issue. Will inform you as soon I have more information.

redsector
Head in the Cloud

Meraki is still working on this issue. Ticket / case is in work.

joopv
Getting noticed

Are you sure that the cabling is OK?  If you ruled that out then ...

In rare circumstances, there are UDLD errors and spanning tree errors caused by a physical issue at the component level on only MS120's. It seemed to mainly be effecting ports 10/11, but after searching I do see some other scenarios where this happened with ports 3. The issue should only impact negotiation at 10 and 100mbps

redsector
Head in the Cloud

Shure good cables, I testet several used and new cables. Issue on alle copper ports. Issue occures also when connected with 1Gig speed.

chris674
Here to help

Hey there!  Seeing some weird UDLD issues with a MS130R switch that we have at one of our sites also.  Worked extensively with Meraki support, we troubleshot all layer 1/2 issues.  I put a MS120-8 port switch in place while Meraki sent a new MS130R.  That temporary switch had no issues at all, no UDLD errors.  I get the new MS130R, put it in, UDLD issues and switch goes down, I'd say every 6 hours to a day or so.  Engaged support again, but they are looking at layer 1 issues again.  Kind of starting to think the MS130(R) ruggedized switches may have a manufacturer defect or something.  Next step for us is to install MS120 again and perhaps move the MS130R to our main stack and connect via fiber there.  If it drops there too I would be extremely confident there is no layer 1/2 issues, and we're looking at faulty equipment DOA from Meraki.  Our uplinks are 1G fiber, SFPs appear to be on the compatible list for the switches.

 

Any luck with TAC on your case?

CoreyDavoll1
Getting noticed

This sounds like an issue that I'm having with a few of my MS130-48X's.  We're using the 10GbE Twinax cable (MA-CBL-TA-1M) for the switch interconnects.  Support suggested replacing the cables so we did that this morning and now I'm waiting to see if we see the same behavior.  I did notice today that when bring up the port settings that UDLD is showing Alert but the port profile is set to Enforce. 

 

Anyone else using Port Profiles?

AdamBillington
Here to help

Any update on your issue here Or any update from TAC

CoreyDavoll1
Getting noticed

16.9 was just released which is supposedly fixing the control plane issue which they said is the root cause. We're deploying tonight so hopefully it resolves it.

TCPIPWizard
Conversationalist

I too am using port profiles but no issues there. I tried using one of the 2.5Gbps ports on my MS130-8X to connect to a SFP to RJ45 10Gbps module and still got the error... Meraki support says a fix is in the works but no ETA on the release of said fix yet. I can post screen shots of support if that gives anyone hope...

 

I am asking for them to extend the license on the MS130, as it is pretty much useless. It just randomly drops traffic and that is unacceptable. 

CoreyDavoll1
Getting noticed

16.9 was just released which is supposedly fixing the control plane issue which they said is the root cause. We're deploying tonight so hopefully it resolves it.

chris674
Here to help

I hope this is the fix.  I've got this upgrade for our network on the calendar after seeing your post.  Meraki TAC has been pretty useless thus far.  They keep insisting our fiber is bad, but it's not.  The MS120 I put in place to test with for two weeks didn't drop once.  Hoping this resolves it for you/us.  I am not using port profiles.

CoreyDavoll1
Getting noticed

Yeah they had me replace my cables too.  I had the same symptoms last night so it doesn't appear to have fixed my issue.  Back to the drawing board I guess.

redsector
Head in the Cloud

It‘s a mess. How long do my clients have to wait?

CoreyDavoll1
Getting noticed

That's a great question.  My network refresh project is pretty much on hold until this is resolved.

AdamBillington
Here to help

I guess it hadn't resolved the issues

CoreyDavoll1
Getting noticed

After a few settings changes that did not fix the issue.  Support is now saying that there is a fix in the MS17 code.  Waiting on details since the published beta still lists the control plane bug as a known issue

redsector
Head in the Cloud

16.9 didn´t help a lot. Its concerning spanning-tree connections and aggregates. As a standalone switch it´s no problem.

AdamBillington
Here to help

in the change log it appears 17.1.1 may have the fix.

 

Ms130 known issues

  • In rare circumstances MS130s may experience management plane congestion that results in UDLD alerts and STP transitions
redsector
Head in the Cloud

not fixed. Still a known issue.

CoreyDavoll1
Getting noticed

I applied the MS17 code on Thursday night.  My SE said that there are some fixes in the beta but they haven't fixed all the triggers yet.

 

In my site that has a MS350 at the core and 2 MS130's (MS350->MS130-01->MS130-02->MS350), the firmware seems to have corrected the issue.

 

However, at another site that has a C9300 and 5 MS130's (C9300->MS130-01->MS130-02->MS130-03->MS130-04->MS130-05->C9300) it just made it worst so I downgraded back to 16.9

redsector
Head in the Cloud

Yes seems to be a problem incteracting betwen Meraki origin and Cisco Catalyst origin. Maybe.

CoreyDavoll1
Getting noticed

Well,  my site that has a MS350 at the core and 2 MS130's that's on the beta MS17 code had the UDLD issue again.  It was stable for a week.  Guess I'll wait for the next release.

 

redsector
Head in the Cloud

Still UDLD-errors.

the newest beta-firmware still has this issue in "rare" circumstances 😂. Here on each place where MS130 are installed on different networks and places. Everywhere!

 

Switch firmware versions MS 17.1.1 changelog

Ms130 known issues

  • In rare circumstances MS130s may experience management plane congestion that results in UDLD alerts and STP transitions
CoreyDavoll1
Getting noticed

My support tech replied back to me with "On our internal thread, our escalations team stated that they expected these issues to be resolved in the latest 17 beta firmware. The reason it's listed like this in release notes is that it's still in its verification stage as being "fixed"."

 

So I'll probably have it pushed and see what happens. 

chris674
Here to help

That's great to hear.  Also awesome that you've had Meraki TAC at least acknowledging it's a problem on their end.  My TAC guy is way less engaged and is certain it's a fiber issue on our end, which it's not.  I upgraded to 16.9 at the site where I'm having these issues.  What's curious is I have one switch dropping a lot (more so with 16.9) on fiber, I have another on fiber that is dropping but maybe once a week.  I have another switch at that site that's a copper uplink, which is fine.  All MS130R switches (their new "ruggedized" switch).  Fingers crossed 17.1.1 resolves this, it's driving me nuts.  Though looks like compared to other posters, one problematic switch isn't huge.  Stopping a whole project for this really should open up Meraki's eyes when it comes to QA testing of their firmware.

CoreyDavoll1
Getting noticed

Well they did try to blame the cables and ports before they fessed up to the issue.  I'm still seeing the issue on 17.1.1 so they haven't fixed everything yet.  One site that has an MS350 and MS130's seems to be better than when it was on 16.9 but not 100% yet.  When I updated my 130's at another site to 17.1.1 that are connected to a C9300 with CS16.8, the issue got worse.

AdamBillington
Here to help

Well at least it has now been identified and acknowledged. This only occurs with the MS130 model for me. 

CoreyDavoll1
Getting noticed

Yeah I believe that its specific to the MS130 line.  I just bought a bunch of them for a refresh project.  I've paused the project until this is resolved unfortunately. 

chris674
Here to help

Same here, we had a refresh project earlier this year, nearly all MS130 switches.  I am seeing this problem specific to our MS130R submodel though.  I guess we lucked out.  I'm eager for the new firmware.

Mark_6F
Conversationalist

My site has approx 30 MS130 and 2 MS425 switches. Same considerations with tac and waiting new realease candidate. This has been a mess.

But, I noticed when connecting other uplink alongside (10Gb DAC, SFP+ or SFP), I got pretty clear and alert free port status. Are there similar experiences?

CoreyDavoll1
Getting noticed

MS17.1.2 is now released.  Pushing it to one of sites tonight.  Hopefully this finally fixes the issue.

chris674
Here to help

Let us know if it helps!  I'm glad to see Meraki fixed this in 17.1.2 according to release notes.  I asked our TAC rep the roadmap for when the code is going to be moved to stable.  We are the lucky ones and only have two MS130R low-traffic switches impacted so we likely will wait until stable.

chris674
Here to help

@CoreyDavoll1 Any news on how the new firmware has performed?

CoreyDavoll1
Getting noticed

It's been stable since I pushed it

AdamBillington
Here to help

how did it all go?

 

ITSDigital
Conversationalist

On Saturday, during the early hours , we had two MS130-48 in different Orgs with 16.8 and 16.9 firmware both stop forwarding DNS but still allowing TCP/ICMP packets to pass. Looking into 17.1.2 as a fix.

AdamBillington
Here to help

HI All, Has anyone updated to this firmware yet and how has the performance been since the upgrade?

CoreyDavoll1
Getting noticed

I've been running 17.1.2 for over a week and its been stable

redsector
Head in the Cloud

I did updates to MS17.1.2 . UDLD error is gone now.

CoreyDavoll1
Getting noticed

Same here

 

shaakir
Here to help

We just went from 16.8 to 16.9 and have seen UDLD and LACP issues.

We've also seen a bunch of other issues with CCTV infrastructure and some venerable SPA 5XX phones started being picky about which ports to run on.

Sh**-show all around, wasting days of our time.

Loads of confidence at the prospect of going to a RC when a minor supposedly-stable upgrade seems to bring down the house of cards.

JHQUEK
New here

I have a setup of Cat 9k Meraki all having this UDLD issue on copper ports, all connecting to C9300-48UXM and C9300-48UXM uplink to MX.

1 C9300-48UXM

2 C9300-48U

2 C9300-48P

 

4 of the setup are now running CS 16.9 and seems stable for the past week after i did the firmware upgrade but one unit of C9300-48P is still refusing to function properly, may need to directly link it to the MX to allow proper upgrade of the firmware.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels