C9300X-M experiences

rhbirkelund
Kind of a big deal
Kind of a big deal

C9300X-M experiences

For some time now, the C9300(X) in Meraki Managed persona has been available. 

We've just had two migrations where they've been deployed in stacks as a L2 Core switches, and based on these migrations we've had a bunch of odd behaviors. 

 

I'm curious to what others have experienced with them? 

 

The latest "issue" we've had with them, and which I find as a rather big issue, is that fact that when doing a change on a port, e.g. enabling a disabled port, the port would still have the "shutdown" command even after up towards 30 minutes, despite the dashboard showing the port being enabled. This really makes it difficult to do Acceptance Tests and Failover tests, by simulating link downs in an aggregated port. 

We could verify that the switch still had the ports in shutdown mode, with the dashboard terminal view. This stack is running 17.15.3.

 

It is as if the 9300 is having some trouble with the config scheduler that should be pushing changes to the config. 

 

To be honest, we are beginning to become rather hesitant with positioning the 9300 in Meraki Managed persona, at the moment. And this is going to become an issue, due to that fact that there isn't really any other alternative, since most other Meraki native hardware is EoL.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
16 Replies 16
cmr
Kind of a big deal
Kind of a big deal

@rhbirkelund what models are you using?

 

I have done quite a bit of testing with a C9300L-24UXG-4X-M and am sure I didn't see this with 17.15.2.  I'll do some testing now with 17.15.3 as it is running that version.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
rhbirkelund
Kind of a big deal
Kind of a big deal

C9300X-12Y and -24Y. 

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
cmr
Kind of a big deal
Kind of a big deal

I'll test with SFP+ ports as well now...

If my answer solves your problem please click Accept as Solution so others can benefit from it.
cmr
Kind of a big deal
Kind of a big deal

The SFP+ ports I have connected are in an aggregation group.  Disabling the group took both ports down and took at least 2 minutes.  Re-enabling whilst disabling a single member didn't seem to work, so I tried it again about 5 minutes later and it worked within about 30 seconds.  Are your ports in aggregation groups? 

 

Edited as I made a mistake....

 

Second attempt to disable a single member went through in about 30 seconds and re-enabling also worked in a similar time.  However disabling the group did nothing...

cmr_0-1746957710225.pngcmr_1-1746957774015.png

 

If my answer solves your problem please click Accept as Solution so others can benefit from it.
cmr
Kind of a big deal
Kind of a big deal

I have tested on a single RJ45 port and it changes about 30 seconds (15-45) after the dashboard change.  Do you only have the problem when the switches are stacked?  Unfortunately I don't have a second C9300L here.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
rhbirkelund
Kind of a big deal
Kind of a big deal

We haven’t tried with an unstacked switch, but I’d assume we would see the same thing there.

 

We’re fearing that if many changes are queued, the sync sort of “stalls”.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
cmr
Kind of a big deal
Kind of a big deal

You might be on the right track there, I have generally only tested a few changes at a time. 

If my answer solves your problem please click Accept as Solution so others can benefit from it.
rhbirkelund
Kind of a big deal
Kind of a big deal

We had disabled a member on three port channels, to simulate stack member failure.  On two port channels the member came back up after enabling it again. On the third it didn’t. Dashboard said it was enabled, but the show run, said it still had the shutdown command.

We also had an entire port channel disabled towards a UX, where switches were being replaced. Once they finished that, we enabled the port channel, and it didn’t come up. Show run also showed that the shutdown command was still present.

 

All this up towards 30 minutes lag. It was as if there was a queue stuck, that didn’t open up until pushing a new command to something different.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
cmr
Kind of a big deal
Kind of a big deal

I agree it looks like a queue as I came back just now, nearly an hour later and the ports were still up. I disabled a separate port and that then also processed the disabling of the aggregate ports within 30 seconds...

If my answer solves your problem please click Accept as Solution so others can benefit from it.
cmr
Kind of a big deal
Kind of a big deal

Re-enabling the aggregate went through within 30 seconds on its own.  @rhbirkelund have you logged a ticket for this behaviour?

If my answer solves your problem please click Accept as Solution so others can benefit from it.
rhbirkelund
Kind of a big deal
Kind of a big deal

No, didn’t log a case, because we didn’t really have the time to wait for Support to pick up the case, and do a whole lot of troubleshooting only for it to either fix itself out of the blue, or being told to reload the stack.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
GIdenJoe
Kind of a big deal
Kind of a big deal

Have you checked the flash for a file that contains failed netconf pushes?
I once had a C9300 that did no longer accept any new configurations because a previous one got stuck.  The switch then writes a file containing the actual XML that it tried to push to the switch and failed.

rhbirkelund
Kind of a big deal
Kind of a big deal

What would that file be called? I was not a aware that such a file would be present, but I'll definitely keep it in mind!

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
GIdenJoe
Kind of a big deal
Kind of a big deal

That's the rub.  I once had a C9300-M switch that error on his config push and the meraki log indicated it for me so I could search for it in flash.  Afterwards the switch had been reset to defaults and I never really noted that location.  It has to do with netconf so you might want to start your search there.

thomasthomsen
Kind of a big deal

I have some "mixed" experiences with C9K in stacks with 17.15.x

Mostly involving software updates.

 

One stack , does everything just fine every time, everything works as expected #happy 🙂

 

The other one ... uh oh ... 😕 strange errors in the log (that there are Cisco bugID on), and when trying to

upgrade, the switch fails (at some point), and does not connect back to the dashboard after rebooting.

The switch, luckily, does its thing, with the config it had.

 

The first time, after the 7200 second commit rollback time, it rolls back (reboots), and updates again, and this time it succeed.
The second time, it just stayed kinda offline, it was strange because the dashboard had "telemetry", but you could not do any changes, see logs, or get the cloud cli to run, and it was clear that it had not updated correctly.
It was this way for about a week 🙂 (because I did not have time to look at it, and the switch "worked" with the config it had).

Then, after creating a case, and I was about to manually get log files from the switch it did a reboot, and updated correctly.
This of course does not #makeyouhappy , and makes you quite nervous for the future.


I mean, all the CS switches, in my opinion, have always been, .... .. "wonky".
Dont get me wrong, things have gotten A LOT BETTER, since those early MS390 models 🙂

 

The biggest problem I have had recently is the VERY long "out of box" to "ready in dashboard" times, to some SFP modules not coming up with an out of box C9300M. This makes it pretty hard for onsite techs to deploy a switch when they dont know whats going on.

 

cmr
Kind of a big deal
Kind of a big deal

This is the problem, Meraki always was easy to deploy, reliable and often missing some features that you might want, but could get away without.  The move to being more fully featured is great, but it seems to have been at the expense of reliability and in the case of the Catalyst switches, ease of set up.

 

I remember with MS stacks it never really mattered if you powered them all up first, connected them first, stacked them physically or on the dashboard first, they usually just sorted themselves out.

 

Hopefully now that a lot of features have been added, the reliability and ease of use can be focused on again.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Get notified when there are additional replies to this discussion.