MS390 loses contact with dashboard/100% CPU - About to throw in the trash

ronnieshih75
Building a reputation

MS390 loses contact with dashboard/100% CPU - About to throw in the trash

We have 3 stacks of MS390 switches, each stack has 8 switches, one stack serves as the core.  The 2 access level switch stacks uplink to core stack via 2 redundant SFP modules.  These switches all of a sudden sporadically lost contact with Meraki dashboard 2 weeks ago in the middle of the night and came back with DHCP relay busted, the code it was running was v14.21 .  Tech said the moment it lost contact with the dashboard, they noticed 100% CPU on the switch stack.  I was advised to run code v14 when I installed these 3 months back.  We ended up assigning static IP addresses to all pc's in order to keep the operation going.  From that incident, the core switch stack crashed completely 8 hours later by itself and brought down an entire call center.  I have not had any real useful help from Meraki support at all.  One tech asked me to upgrade to the latest v14.26 code which I did.  Then the core switch stack lost contact again, and I opened another ticket.  This time, the tech asked me to downgrade the code to v12.28.1 which I WAS ON back when the switches were first installed.  Just 2 hours ago, the switch core stack lost contact with Meraki dashboard again for 10 minutes for no reason.  I saw in the event log that 8 power supplies and 4 SFP modules got reinserted?!  This is complete untrue and nobody did that onsite.  This time, a third tech asked me to prune VLAN's on the uplink ports, which I've never had to do on the MS350 switches.  These are Cisco Meraki switches last time I looked, not some Netgear or D-link cheap variant.

 

We run the same topology in several other call centers, but with MS350 switches, and never a problem.

 

Are these switches garbage with constant software bugs or do I need to put up with weekly reboots?!  We are now basically sourcing eight MS350 switches and junking the MS390 switches.

106 Replies 106
Inderdeep
Kind of a big deal
Kind of a big deal

@ronnieshih75 : this is really surprised with the MS390, I never used that but not sure if anyone else have the same issue. i will check within my network to see if they had these kinds of issues. 

Regards/Inder
Cisco IT Blogs awarded in 2020 & 2021
www.thenetworkdna.com
ronnieshih75
Building a reputation

I should note that I am breaking out dual WAN circuits for a warm spare MX100 router right on the core switch stack, common practice yes?  I initially tried breaking out the WAN via two separate MS120-8 switches, just like how I've done in 3 other call centers but with MS350 switches with ZERO issue.  Except, when I did this setup using MS390 switches, it caused a loop so I ripped out those MS120-8 switches and broke out the WAN right on the core switch stack.  The last Cisco Meraki tech I spoke to is implying that the 2 VLAN's I created on the 2 core switches to break out the WAN might be doing multicast and broadcast to the various trunk ports, causing a loop, so I need to prune VLANs.  I've never had to do that on traditional Cisco switches, configuring VLAN pruning is nothing more than switch config anal-ness or "best practice-i-know-these-commands" thing.  So fine, I will head to our call center tonight to basically rip out the warm spare router and disable those WAN breakout ports to stop the said multicast and broadcast from VLAN991 and 992 I created to the various trunk ports.

 

However, the above does not explain why the switches said 8 power supplies and 4 SFP modules re-inserted themselves or why they lost contact with Meraki dashboard for 8 minutes.

 

PLUS, please note that MS390 switches do not support loop detection.  Read the Known Issues section in release note for release v14.26 + all its previous version release notes.

cmr
Kind of a big deal
Kind of a big deal

@ronnieshih75 I'd think you are pushing the limits with a stack of 8 switches in L3 mode.  I know it is *supposed* to work, but I'd normally only take a L3 stack to 4 switches and if the site needs more than that in one rack I'd have a L3 pair and separate access stacks.

 

However the 390 is still very much a work in progress.

 

Perhaps get a pair of 355s for the L3 core and link them to 390 access stacks using either the SFP+ or QSFP+ ports (you get 4x the former and 2x the latter per 355).

 

We have a stack of three 355s running 14.26 in L3 mode and they are stable.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
ronnieshih75
Building a reputation

I am literally tearing out the warm spare MX100 router and the 6 ports for WAN breakout on the core switch stack, and dismantling the aggregations between core and 2 other IDF stacks tonight because that's what Cisco is implying, that I am multicasting and broadcasting traffic meant for the 2 WAN breakout VLANs into the the various trunk ports heading for the LAN and access level switches.  I have done setups like this on various traditional 2900 and 3800 series Cisco switches without doing any type of pruning and never a problem.  switchport mode trunk & switchport trunk native vlan 1 that's it, no more commands than that.

 

Why would a stack of 8 be an issue if Cisco designed it to do so and I installed it following the guideline?  I have done stacks of eight traditional Cisco 3850 switches without any issue serving as layer 3 core, similar experience on Juniper and Adtran switches without any issue.  This is the one time that I need to babysit the switches in fear of them unexpectedly crashing.  I feel like I am literally discovering software bugs.  So if a product is not mature by all means do not push it out with a bunch of bugs.  I've got eight MS350s on order in preparation of the entire ripout of the MS390 switches.

PhilipDAth
Kind of a big deal
Kind of a big deal

>Why would a stack of 8 be an issue if Cisco designed it to do so and I installed it following the guideline?  

 

Because MS390 firmware is buggy.  No comfort to you, but all the other Meraki switches are great.  Just the MS390 has issues.

PhilipDAth
Kind of a big deal
Kind of a big deal

I bet your 3850's are running the older 3.x series of code (like 3.8).  The new 16.x and 17.x code is not so stable ...  don't ever upgrade the software on those 3850 unless you like switches that crash.

 

The quality of IOS-XE on Cisco Catalyst switches has really gone down hill.

Inderdeep
Kind of a big deal
Kind of a big deal

@ronnieshih75 : why we cant go with Stack of 8, i would say there is no such limitation mentioned on stack of 8. The issue here is  firmware for sure !

Regards/Inder
Cisco IT Blogs awarded in 2020 & 2021
www.thenetworkdna.com
ronnieshih75
Building a reputation

Last night, I tore out the warm spare MX100, removed the 2 WAN breakout VLANs on 6 ports of the core stack, limited uplink ports to the single MX100 to only VLAN1, and limited uplink ports to all AP's for 3 specific VLANs.  Pruned VLANs, check.  Never did that on traditional Cisco switches.  Now will monitor for 2 weeks and get ready to rip out all 8 MS390 switches as core.

 

Unfortunately when this call center was built back in May, all MS350 (what we use as standard) were all on backorder so management ordered all MS390 switches.  What I am hearing is that I should just eat it and put up with it even though I installed them according to Cisco's spec.  Such as life.

misterguitar
Getting noticed

MS390 - hahahahahahahahahahahahahahahahah

 

OK sorry. I have been living the nightmare of trying to get these switches to work reliably for over a year now.

 

I'm so frustrated with these things I could throw them in the trash too.

 

As I sit here now one of mine is flashing yellow claiming a vlan mismatch. If you check the port it has two cdp entries on it. Used  a packet sniffer and the port is only getting CDP from one device. And the VLANs match.

 

All the support engineers who have dealt with my tickets are ready to throw these switches in the trash too. 

 

If I could load IOS on them I would do that in a  heartbeat.

ronnieshih75
Building a reputation

@misterguitar 

I'm wondering if yours sporadically crashed?  The set I have would sporadically lose connection with dashboard, most of the time in the middle of the night.

 

I don't know if Cisco tech support is watching here, but when will these switches act like switches?

cmr
Kind of a big deal
Kind of a big deal

@ronnieshih75 do you have issues with the access stacks?  If not, why not change the 8 member L3 stack to a 2 member L3 stack with a 6 member access stack?  I know that you are supposed to be able to stack 8, but remember with a stack;

 

  • 1 member is master
  • All other members have to listen to master and each other in case master fails
  • Every feature adds complexity
  • Every extra port adds overhead

Therefore the most reliable stacks have the fewest features and the fewest members.  I'd happily stack 4+ switches as L2, but am wary of more than 3 as an L3 core.  And that is with Cisco IOS switches or Force10 switches as well as Meraki ones.

 

 

 

 

 

 

If my answer solves your problem please click Accept as Solution so others can benefit from it.
misterguitar
Getting noticed

Oh god. What have ours NOT done?! LOL random stack member reordering out of nowhere, random issues with stacking if you have a catalyst network with RPVST+ enabled (We had to shut off BPDU's between the merki network and the catalyst network and put them in separate STP domains isolated from each other.) Random links in the stack would get blocked and not connect if the STP was left on with the catalysts networks. No QoS support, L3 changes requiring reboots of entire stacks. Even with the updated beta firmware reboot times are insane.Having to make consideration for order of changes because of the necessity to only being able to configure layer 3 while being cloud connected. I could  ramble on for HOURS about the nightmare thess switches have been. And the worst part is being meraki, it's a BLACK BOX. You have almost ZERO ability to debug these things like you could with IOS. If I had a CLI and  debug, I could figure out what is wrong in 10 minutes. And if I had access to the IOS fix it in 5 minutes. But these switches are sold as the idiot boxes with idiot lights and an idiot GUI to CEO types. Just plug them in and they just work! Get rid of highly trained highly paid network staff and hire a high school kid to run your network! And it would work that way if they actually worked. Unlike the MS390. I'm probably being hyperbolic and letting my frustration with these things show. The meraki system was designed for medium to branch office stuff. not enterprise level switching. Unfortunately a lot of us engineers get stuck with technical decisions CFO types make these days. And any time you dangle cutting expensive IT head count out the CFO will bite. So we get to try to work around half baked MAeraki products.

ronnieshih75
Building a reputation

@cmr 

Access level switch stacks are "mostly" ok.  They have randomly dropped off the meraki dashboard for no reason also, several times in the 3 months that I have deployed these.  Switch dropout from dashboard most of the time occurred randomly in the middle of the night when no one is working on them.  Like I said, I have 8 MS350s coming to rip out the stack of 8 MS390 right now.  These switches need to go to the junk pile, it defies everything I know about Cisco switches.  I started my career using Cisco 2900 switches more than 20 years ago and I literally let those collect dust and they never even tried to die.  Rather disappointed in the Meraki MS390 switches.

ronnieshih75
Building a reputation

An entire stack of eight MS390 access layer switches lost contact with Meraki at 4am Saturday night, while nobody was present onsite or working remotely on them.  I am now being called by management as I don't know what the hell I'm doing at this point.  These switches will literally get me fired if this keeps up.  Yep, rebooted them via smart UPS power outlets remotely and they still don't re-establish connection with Meraki cloud.  I literally need to go there and reboot them the old fashion way by pulling the plugs.  Nothing but a string of mysterious issues recently, including the non-responsive DHCP issue on nearly all MX84 and some MX100 routers.

 

My IT director will be talking to our Cisco rep and take this to the top.  Entire line of MS390 switches need get SCRAPPED IMMEDIATELY.  This reminds me of the first generation of 3COM hubs I worked on more than 20 years ago.

ronnieshih75
Building a reputation

BUMP.  I'm going to keep this thread at the top until someone from Cisco backend engineering answers this.

ronnieshih75
Building a reputation

Old fashion power cycle by pulling the power plugs then plugging them back in brought them back into the meraki dashboard, then they dropped again for another 10 minutes, then they came back again.  Normal MS390 flakiness.  I was told yet another different thing to do and this time they had me dismantle the aggregation group between the MDF stack and the IDF stack.  So basically I cannot run any advanced functions on these switches it seems.

 

Configuring a pair of MS350 as the new core, going in on Friday night.

ronnieshih75
Building a reputation

Problem confirmed by Meraki support as of this morning as yet another bug, causing MS390 to lose contact with the dashboard.

misterguitar
Getting noticed

Has anybody considered a class action lawsuit on these switches? They were essentially sold on false pretenses before they were fully functional. Meanwhile my license time is ticking away while the switch is not working. Has anybody at Meraki considered extending license periods WHEN they become more functional? Or some kind  of replacement trade in program? These switches have been a disaster for my customers and left a very bad taste in their mouths regarding future Cisco purchases. 

ronnieshih75
Building a reputation

Have not considered that.  I have a bunch of angry people behind me for that large call center with these switches installed.  I had to describe to the director at the call center that "you are driving a Tesla using beta autopilot software to drive" then she understood what I was getting at.  The MS390 switches need to be pulled from the website and any market for sale until there is a solid stable version of the code PERIOD.  I don't know any managed switches without the loop detection feature, but these $7000 switches don't support it. 

 

I was damn near ordering twenty-five Netgear 48-port gigabit switches last week via Amazon to toss out these expensive pieces of tarnished gold.

ronnieshih75
Building a reputation

3 units of MS350 switches swapped in as the layer 3 core, the original eight unit MS390 core is now an access-layer stack.  Not only is it super stable, I even gained 200+ Mbps in internet bandwidth, 950Mbps total out of 1Gbps at any given time.

 

I am proceeding with swapping out the rest of the twenty-five MS390 switches in the coming days.  They are being returned to Cisco as beta junk.  That's about $180000 in unstable hardware/beta meraki switch code.

misterguitar
Getting noticed

Unfortunately my management is going to continue with this believing the code will get straightened out and it will future proof the customers for "new features" that the existing Meraki hardware won't support. meanwhile over the weekend one client had an entire stack go offline Saturday and they had to reboot them to get them back online this AM.

ronnieshih75
Building a reputation

@misterguitar , that was EXACTLY what I was dealing with!  It's ridiculous also that old fashioned "power-plug-pull" is required to reboot those damn things.  Also when you speak to Meraki support, they go, "oh you rebooted the switches so the logs are lost".  I said "I have to reboot the switches for you to see them in your meraki cloud yeah?!  and local status page wasn't even responding with my laptop straight into a switch port, in person?"  It boggles my mind that logs aren't kept for a specific amount of time, they are GONE when switches are rebooted. 

 

I don't know how big if your customer's deployment is but basically these switches took down our call center filled with over 200 people, 3 separate times, and I had to explain to the IT director what happened, 3 separate times.  

misterguitar
Getting noticed

These clients are very large deployments. Think hundreds of MS390 switches. With thousands-tens of thousands of host connections. And these are large public institutions in the public eye. I can't say anymore than that.

ronnieshih75
Building a reputation

I am at the tail end of swapping out all 26 switches or 3 switch stacks.  The entire infrastructure stabilized as soon as I swapped out the core stack with three MS350 switches.  The last MS390 stack continues to lose contact with the Meraki dashboard every 2 weeks like clockwork and I pretty much got numb doing firmware upgrade constantly on those.  NO firmware can fix MS390 quirkiness.  Everytime I call support when MS390s get stoned, they ask me if I can get into the local status page of a specific switch, NO I cannot.  The whole stack is unresponsive management access-wise, and requires manual pull of the power plugs to reboot to access them again.  Then Meraki support would say "well you rebooted them so the logs are lost".  

 

Last stack to be swapped out this Friday night.  I won't miss the MS390 switches.

ronnieshih75
Building a reputation

Finished removing all MS390 switches last Friday night, the MS350 are rock solid.  I sometimes think Cisco's Meraki division is simply performing code experiments using customers.  Literally right after this entire swap, I noticed Cisco had updated the most recent stable switch code with the following known issues remaining, which they literally experimented me on.  So how does one expect to manage the MS390 switches, a switch that needs to talk to the cloud for configuration changes when the control plane resets and cause loss of connectivity to the dashboard?  It is STILL NOT FIXED IN CODE V14.32 .  It doesn't matter that it doesn't affect data plane traffic, the switches CANNOT BE MANAGED once the control plane crashes.  Cisco -> you still don't understand.  I experienced all items highlighted in RED below, which are critical basic switch functions.  Once the control plane crashes and loses connection with the Meraki dashboard, the switches do not report back until you manually power cycle by pulling the power plugs.  People are left in the dark regarding the state of the switches.

 

My final announcement in this thread to the public out there:  DO NOT DEPLOY OR INSTALL THE MS390 SWITCHES.  They are not reliable and they are missing vital basic switch functions, and Cisco needs to pull this product off the shelf and stop experimenting with their customers' networks.

Switch firmware versions MS 14.32 changelog

Known issues

  • MS390s may experience control plane resets which could impact dashboard connectivity. This does not affect data plane traffic.
  • In rare instances, DAI inspection may fail to snoop DHCP transactions on stacks leading to those clients being in a blocked state
  • If a combined network has Umbrella integration, changes cannot be made to the group policy page (present since MS 14.5)
  • MS390 ports are limited to the lowest link speed since boot if QoS is enabled
  • MS120 in rare instances will not be able to perform packet captures until rebooted (predates MS 12.28)
  • In rare instances, a stack member may go offline until rebooted (present since MS 12)
  • MS390s may experience a brief 1-2 minute control plane outage. The data plane will not experience issues during this time.
  • In rare instances, non-390 stack members will reboot (present since 12.29+)
  • Packet loss is observed when pinging the MS390 management IP
  • MS120s on rare occasions will reboot (present since MS 11)
  • Stackpower is not enabled on MS390s by default
  • Links being established on a MS120 can result in neighboring ports to flap (present since MS 11)
  • MS390 - Port Up/Down Events Shown Across All Members
  • Enabling Combined Power on MS350/355 switches results in events being logged once per minute (present since MS 11)
  • Networks containing a large number of switches may encounter issues saving changes on the Switch Settings page
  • Stack members are not being marked to update their configuration when changes are made on other members
  • mGig switches will have an amber light for all physical ports that do not negotiate to the highest supported speed. Dashboard will continue showing a light green status for all ports above 100Mbps. Example, MS355 switchports will incorrectly show an amber light for 1G, 2.5G, and 5G, but will show a green light for 10G.
  • The list of switches to clone from fails to load when cloning a switch in an organization with a large number of switches and networks
  • Broadcast types of traffic can leak into the Guest VLAN if a port that fails authentication has a Voice VLAN configured, and dashboard has a Guest VLAN defined (present since MS 11)
  • MS120s switchports with MAB authentication may randomly deauthenticate clients. In order to resume client authentication on that port, a switch reboot is required (present since MS 12)
  • MS390 series switches do not support SM sentry
  • MS390 series switches do not support Meraki authentication
  • MS390 series switches do not support URL redirection
  • MS390 series switches do not support MAC whitelists
  • MS390 series switches do not support loop detection
  • MS390 series switches do not support warm spare/VRRP
  • MS390 series switches do not support UDLD
  • MS350-24X and MS355 series switches do not negotiate UPoE over LLDP correctly (predates MS 10)
  • Rebooting any MS390 stack member via the UI will result in the entire stack rebooting
misterguitar
Getting noticed

I agree 100%. I would love to have all these features in a 390, but don't risk my career by making me have to explain to my customer why this turd doesn't work. especially for the ridiculous price of these switches compared to Aruba or others. Only sell it when it actually works!

StevenEarl
Here to help

Do Not buy these things. We are having major issues with a stack of 4 MS390 connecting to Cisco 2960x switches. They will just stop routing traffic. CDP neighbor on the 2960x sees the MS390 stack and the MS390s see the 2960x but they will not route traffic. From 2960x can't even ping the gateway. If you have a working switch and unplug it - good luck getting it to talk again. 

 

We have another location - pure Meraki (no MS390s) and do not have any issues - APs no issue no matter whether in a standard Cisco switch or in a Meraki switch.

 

Have over 1/2 dozen Meraki engineers looking at this without any resolution. Over 1/2 dozen Cisco TAC engineers looking at our problems without resolution.

 

Suggested configuring spanning mode MST but that is causing more issues. If you could take off the Meraki OS and run as a standard 9200 stack it would probably work.

misterguitar
Getting noticed

we had similar issues except it was with Catalyst 4506E chasis running IOS XE.

 

If you want the 390's to work reliably, do this.

 

1. Allow no spanning tree interactions between the 390 stack and any catalyst switches. We also tried MSTP and it just did not work. This means RSTP off on any ports that connect to catalyst on the meraki side and BPDU filters on that connection on the cat side.

2. If the MS390 stack is doing the routing, then any time you make a L3 change on that stack, like adding an SVI, reboot the whole stack. We found doing this makes the switches work correctly. Usually an L# change would take ours down and have them acting in a very bizarre fashion.

3. Use the latest firmware. They just pushed 14 to stable and already have v 15 in beta with regular updates. We have found the stability and functionality get a lot better with each update. But we also discovered sometimes you will need to reboot the switch more than once after a firmware upgrade. We found sometimes after an update the switch would reboot mutiple time son its own as it were doing microcode and ios XE updates under the hood. but deferring to later reboots to do other things.

4.Hard code spanning tree topology using priorities in switch settings. Set roots, then next layer down 2nd tier , 3rd tier etc. Don't leave this up to chance.

5. Don't use VLAN 1 as a data bearing VLAN. ALso prune vlan 1 of any trunks.

bmarms
Getting noticed

I had a stack of 3 of these I was going to deploy in a new office.  thankfully I unpackaged them and attempted provisioning a month or so before ready to deploy and noticed many, many issues with them versus the ease and success I've had with the MS250 models.  I reached out to my hardware vendor and cisco rep immediately and threatened the lawsuit based on false advertising as the switches weren't actually capable of what the datasheets said.  I was able to get them RMA'd and replaced with MS355's without issue.  If Cisco is going to attempt to move Meraki to a frankenstein Meraki/Catalyst monster, I'll consider a move to Juniper.

cjdavis74
Conversationalist

I've been on the MS390's for almost a year now. All I can say is I'm EXTREMELY disappointed in Cisco for not trying to make this right. I've been a loyal customer for 15 years and this is the most issues I've had with switches in that time.  I've brought many of these same concerns to my Sales Manager and I always get the same answer.  That they are expecting these issues to be fixed in the next firmware upgrade.  I've completely lost faith in Meraki. I want to throw all these in the trash and purchase the MS350, but part of me thinks why should I give them more business. I'm tempted to just purchase all new HP switches. SO FRUSTRATED and Cisco doesn't care.  

misterguitar
Getting noticed

I sus[pect a lot of folks are having resume moments for buying into these.

Steviespitfire
Here to help

We are having same issues, we bought into Meraki last year and have 3 stacks of MS390's.

VLAN mismatch errors, losing connectivity every night, now a whole stack has gone dorment on dashboard.

These are not ready for the Enterprise - come on Cisco sort it out.

ronnieshih75
Building a reputation

You guys need to speak to your Cisco rep and deal with them directly.  I was able to make a deal and swap out all 25 MS390 switches with 25 MS350 switches.  I have not had to do single unexpected reboot since.  I don't think I need further proof that the MS390 switches are garbage, there are plenty of evidence here.  Although it took me several nights to swap out all 25 with over 1000 connections.

 

BTW, I am not doing any VLAN pruning on any trunk ports as I was doing on the MS390.  I believe VLAN pruning did nothing whatsoever on the MS390 and was a "flail" attempt from Meraki support asking me to do that.  For those who think I was "pushing the limit" by stacking 8 switches, I have no issue doing so with MS350 switches and why would I be pushing the limit if it's described in the data sheet as an approved setup?  I have stacked 8 Cisco 3850 switches and 8 Adtran switches in the past without issues.  Same scenario here, except that MS390 just can't do it properly.

Steviespitfire
Here to help

Lost another MS390 stack off dashboard last night - thats 12 switches now that have become unmanagable.

We run a 24/7 operation so will be looking to replace these with something else ASAP.

 

misterguitar
Getting noticed

so one thing we did to fix this issue is we noticed some things going on under the hood regarding the latest firmware. If you are running 14.32, make sure to reboot the stack TWICE from the meraki console. It HAS to be done from the Meraki console. You can't just yank the power plugs.on the second boot after the firmware update is applied, the reboot takes MUCH longer and it appears as if some kind of microcode update is applied. We noticed if you do this, then the switches quit dropping off the cloud console.

Bur20
Here to help

Anyone try the beta 15 firmware on these 390's at all yet? Any noticeable stability?

 

We have a stack of 4 in our core that have had random 3-5 min outages since updating to firmware 14 from 12.28. I did convince our Cisco rep to send us trial 355's until these issues are worked out on the 390's.

ronnieshih75
Building a reputation

I suggest you just dump the MS390s for something else.

 

When v12.28 was the stable code back in May, I upgraded to the latest v14.2x just like what you are asking now for code v15.x.  Meraki support will keep asking you to upgrade to the latest code until you can't take it anymore.  They are using you as an experiment to stabilize the codes.  Has it stabilized?  Absolutely not.

misterguitar
Getting noticed

These are the things we did that finally brought stability to our MS390's. We are running 14.32

 

1. Completely isolate spanning tree on Meraki network from any Catalyst PVST spanning tree using BPDU filters on the catalyst side and turning off RSTP on interfaces connecting to catalyst from the meraki side. Catalyst and Meraki STP do not play nice together on the MS-390. 

2. After any L3 SVI changes (Adding removing, or moving VLAN subnets), reboot the entire stack. Other MS390 stacks on the network may need to be rebooted too. We just wait till the evening and reboot ALL of them. This is a known issue.

3. After updating to 14.32, make sure to reboot the MS-390 stack TWICE from the Meraki console. (Not just pulling the plug. FROM THE CONSOLE) The switch will take a long time and some strange lights will flash during one of the reboots. We suspect some type of firmware/rommon update is being applied quietly. But t6he switch will not apply this if you just pull the plugs or cycle power. it has to be a graceful from the console reboot to trigger it.

 

After all of this, they quit dropping off the console and have been relatively stable.

 

We also found in some cases with clients application layer firewall filtering on Palo Alto, Checkpoint,  or FortiGate was causing issue with blocking management plane traffic to the cloud. Make sure the switch IP;s are exempt from any IPS/IDS or L7 firewall filtering or expect problems. 

 

If you are using these as Data Center switches in a hosting environment where constant layer three changes are being made, you may want to consider Ronnies advice and get them swapped for native Meraki switches as the constant rebooting may be unpalletable.

Bur20
Here to help

Thanks. We are going to try your suggestions.

Bur20
Here to help

Hopefully I don't jinx myself, but after doing the 2 clean reboots from the cloud dashboard the switch stack hasn't had an outage yet. We were having network outages everyday. Switches are on 14.32.

 

I also noticed after the reboots that the management port interface is now very responsive whereas prior it was sluggish, kept getting errors, or just wouldn't load at all.

misterguitar
Getting noticed

Great news!. Ours now are still stable and not dropping off the console now weeks later after those reboots. I wish they would have mentioned this in the firmware read me. We figured it out by blind luck.

Bur20
Here to help

Thanks for your suggestions! I wish support would have mentioned something 2 months ago when we initially opened a ticket. I feel like they didn't even know how to approach troubleshooting the issue.

StevenEarl
Here to help

We now have MS425 in place for our distribution stack. All other IDFs connect to the MS425, We have RSTP running on all Cisco 2960x switches (MST is supposed to work but only if ALL are running MST). We still have the MS390 stack in place but they now only have servers (access ports) and AP trunks configured. This has been working for about a month (fingers crossed). The MS425 seems to be able to handle RSTP better than the MS390s.

 

If I could start over with Meraki as the final solution, I would put in Meraki APs first, then Meraki MS in the IDFs, and THEN replace the core.

JacekJ
Building a reputation

MS425 and MS225 are very stable, on numerous occasions while I was chit-chatting with the support they told me that these are the most reliable platforms and to avoid MS390 😉

ronnieshih75
Building a reputation

@Bur20 , wait 2 weeks then come back and report.  The MS390 stacks I had would self destroy every 2 weeks.  They were like babies with fresh sunday sleeps every time after I rebooted but would just fall off the cliff approximately 10 days to 2 weeks, starting with 100% pegged CPU taking down the control plane module which causes it to lose contact with Meraki cloud.  I was literally doing firmware update every 2 weeks to the latest beta (yup, as recommended by Meraki support), after every crash.

 

It is only taking an entire year to fix this model, with experiments on live customers.

Bur20
Here to help

Ahh don't say that lol. We do have some MS355's in route.

misterguitar
Getting noticed

Did you do the reboot from the CONSOLE TWICE. (Not by pulling th epower plug) trick? Ther eis a microcode update that gets applied ONLY if you do a graceful shutdown from the console. it never gets applied if you reboot by just power cycling. We verified this through experiments.

Bur20
Here to help

Guess I spoke too soon. Was good for almost a week and the outages started again. Had a big 30 min outage today where the whole stack did a complete reboot. We just got our MS355X's yesterday. These will be going in ASAP.

ronnieshih75
Building a reputation

Told you so!  Every 10 days to 2 weeks exactly, they self-destroy, even on the newest beta firmware.  Well, 7 days for you this time.  Rid yourself of all stress before the holidays and get those 355's in.

misterguitar
Getting noticed

Interesting Bur20. Ours have not. I'm trying to nail down exactly what is causing so much instability.

 

Are you making a lot of L3 changes to the switch on a regular basis like in  hosting environment?

 

Are you connecting the Meraki to other vendors or Cisco Catalyst STP topologies?

 

I am highly suspicious the instability comes from bugs in STP and SVI creation/deletion/changes.

Bur20
Here to help

@misterguitar We very rarely make any L3 changes on the switch. We haven't since rebooting these switches twice.

 

No Cisco Catalyst switches. We are all Meraki.

 

I do see some loop inconsistent syslog messages from a couple of our MS250's right before we start losing connectivity.

 

Support told us from the logs we sent them a few months back that the switch hit panic and rebooted the whole stack. Can't confirm for sure, but I believe this 30 min outage had to be the same. 

ronnieshih75
Building a reputation

@misterguitar 

 

My 2 cents:

- I wasn't making frequent L3 changes and it's not a hosting environment.  The L3 changes were done after first 2 days of install and configuration

- No other brands or types of switches or routers were connected to the MS390 switches, pure Meraki equipment.  No Cisco Catalyst equipment either, which I'm also familiar with.

 

No problem for 2 months, then all 3 MS390 stacks started blowing up every 1 to 2 weeks, during which time, no extra equipment was added.

 

By the way, I was on switch firmware v14.26 when I ripped out all those MS390 switches.  I was doing firmware upgrade every week on those, up until the "throw them out" week.

Steviespitfire
Here to help

So rebooting stack has brought the switches back into dashboard, but totally lost faith in these now. Meraki support do not seem aware of these issues (or not owning up to them). Sooo wish we had bought Catalysts instead.

Bur20
Here to help

Took 2 months and unleashing my frustration with our Cisco rep for them to finally acknowledge there are major issues with the MS390's that they are aware of. I was able to get them to send me some trial MS355X's to use until they fix the 390's. I'm not going to feel comfortable putting these 390's back online until I see some major progress with the firmware.

StevenEarl
Here to help

Same here. We are getting the MS355s as well. The MS390s might be fine in a distribution cabinet with only access ports but I don't have any confidence in them. The rest of the Meraki line - great.

Steviespitfire
Here to help

Think we will have a chat with our Meraki Rep - see if we can get them replaced with MS355's, seems like the way to go.

bmarms
Getting noticed

the 390's are a frankenstein switch and almost useless.  thankfully we had our's RMA'd for 355s.  the 355s would reboot every 4-6 weeks on 12.32 code.  we upgraded to 14.32, which broke 802.1x on non STP ports, but they havent rebooted now in 6 weeks.  had to make sure all of our 802.1x ports had STP enabled.  considering a switch to juniper since their acquisition of Mist and their full AI integration at the access layer

misterguitar
Getting noticed

So are the MS355X a newer version of the MS355? Or are they the current MS355-48X switches? Just trying to figure out if a new MS355 is on the horizon and you have a beta switch.

Bur20
Here to help

The ones we have are the MS355-48X.

cmr
Kind of a big deal
Kind of a big deal

@misterguitar the available models are all MS355-nnX or MS355-nnX2.  The only difference is that the X2 have more mGig ports per switch and a corresponding fewer number of 1Gb ports.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
misterguitar
Getting noticed

I can say they are definitely aware of the issue. And progress is being made. It's just happening very slowly.

bmarms
Getting noticed

cisco may have done irreparable damage to the loyal meraki customer base by releasing this model 

ronnieshih75
Building a reputation

The truth is, Cisco Meraki is using customers as test beds to improve the MS390, which I think is totally the wrong way to bring this product up to par.  It's like releasing a tire that might blow up (Yep, we remember Firestone).  After about 2000 sets of tires blow up then they true up the product??  

 

I handled and repacked those 25 MS390 switches to ship back.  They remind me of the Catalyst 3850.  In my opinion, Cisco should not take Catalyst switches and try to phase Meraki firmware into them.  If they do, they better make sure it's tested to max.  Otherwise, this product should be pulled completely.

bmarms
Getting noticed

agreed.  as soon as i unpacked ours and saw the cisco bag with the catalyst ears etc. as well as the catalyst stack cables i had a bad feeling.

Steviespitfire
Here to help

We have finally lost patience with these now, support have had months to sort this - and nothing.

We have asked our Account manager to get the 390's swapped out for Catalysts.

bmarms
Getting noticed

I cannot believe they are still selling these devices

misterguitar
Getting noticed

I have some news I can report. We had a come to jesus meeting with the Cisco company folks at Meraki development and sales and such. Relief is soon on the way and it will arrive soon. I have no clue if what they told me was confidential or not so I won't go into too much detail. I can say they are aware this was a huge problem and have made a lot of changes on the back end to deal with this. A patch to 14.x is coming soon that will fix all of this. Going forward new feature additions will not be done at the cost of regression of old bugs either for the 15 beta. I can also say future Meraki hardware is going to be on the catalyst switch infrastructure. So you may want to hold off on swapping out MS390's for older Meraki hardware. A fix for these issues will be here in weeks. We walked away thinking that we will continue to deploy this platform as long as these patches pan out and the changes to QA are implemented and stability is kept. They seem to realize how damaging this was to their reputation and are serious about fixing it. A this point I would not trade a MS390 for a 7 year old Meraki design switch. Swapping for catalyst I would do as long as I didn't have to run a mixed environment.

bmarms
Getting noticed

i get it but, if we wanted cloud managed catalyst we wouldnt purchase meraki.  rather than trying to integrate catlayst and meraki, cisco should be focused on developing additional UI analytics and AI for the merkaki platform as juniper has done with mist. im installing MS355's for now and, when it's time for a lifecycle refresh, i'll be looking at moving to juniper/mist assuming my POC works out.  Merkai has yet to announce/release an access point with a 6ghz radio.  meraki, like many of cisco's acquisitions, is falling behind their competition. 

misterguitar
Getting noticed

To be fair, they will be Meraki. They don't run catalyst but Meraki code ported to the Catalyst hardware ASICS. It makes sense for me for Cisco to do this. Frankly I think the Catalyst hardware is much more robust. It also makes sense they would only want their developers on one hardware platform. Even if they sell two different products to two different markets.

 

As an old hand, (who is just looking to eek out a living at this till I can retire) I can see the writing is on the wall for the old Catalyst CLI. Everything is moving in the direction of API's and Python or other scripting languages for configuration. All the college kids we just hired are python wizzes. And to them everything is an API. A command line for them is only for running the scripts they just wrote.

ronnieshih75
Building a reputation

You bet they are still trying!

 

Check out the latest v15.8 Beta code, still a crap ton of MS390 known issues not resolved.  "loop detection not supported"??  STILL?  For a Cisco made switch?!

 

@misterguitar I heard that before from Cisco and ran v14.x beta codes for a couple months.  Then I said I'm done with this.  I was driving to our call center every other day to literally pull the power cords and plugging them back in, just to reset them.

 

Ste_Eth
Comes here often

Hello,

 

we have a 4 member stack of MS390 and we loses contact with dashboard.

 

We have reboot one stack member in order to check if it will contact the dashboard again afther the reboot but we didn't have luck.

 

Is a chance reboot a stack menber for time in order to resolve? or we must need to reboot all stack member at one time?

 

also I've open a case and support say:

 

The fix is expected to be in an upcoming MS 14 GA release, along with a future MS 15 release. However, there is no ETA for it yet.

 

That's incredible for CISCO. I hope the fix is on boarding... 

ronnieshih75
Building a reputation

If you originally rebooted the stack using the dashboard, it doesn't really work.  You need to be right by those switches to pull the power cords physically then plug them back in for dashboard contact to re-establish.

Steviespitfire
Here to help

Meraki support have now acknowledged that we need to update the Firmware to 15.9.

Will be doing this tonight - will let you know outcome.

ronnieshih75
Building a reputation

You are being stringed along for Cisco's grand experiment on MS390 switches, I suggest you get off them by speaking to your Cisco rep.

 

I was upgrading firmware on those switches nearly every week before I got rid of them.  Look at the "known issues" for v15.9 beta code, some of those issues have existed since 9 months ago on firmware v14.x when I installed those switches.

misterguitar
Getting noticed

To some extent true. But the alternative is to trade them for MS-350 or 355. Both of which are aging switches in the product line. Meraki has also indicated they will not be developing on the broadcom platforms for new switches in the future. But on Cisco hardware going forward instead. Swap them for older switches at your own risk. They will be stable but they are old right out of the box.

PhilipDAth
Kind of a big deal
Kind of a big deal

I would never sell an MS390 to a customer now.  Even if they did claim to fix them.  Imagine if they did a Google on them.

 

I'm sticking to selling the existing line up all day long.

misterguitar
Getting noticed

I don't want to rain on the parade, but 15.9 will not fix the dashboard connectivity problems or the L3 changes making stacks unstable. It is our understanding after talking with development that these items are going to be fixed in a  forthcoming 14.x patch release in the very soon future. For the time being until we get this patch we have modified our deployments. We don;t let stacks get too big, and we don't deploy in an situation where an MS-390 is going to be constantly having L3 changes made (SVI's created and deleted) constantly. (example a hosting environment.) Meraki has given us assurances tha future firmware will not make these bugs reappear as they will not be allowing any updates out that cause them to reappear. They are very aware of how much of a black eye this has given them.

StevenEarl
Here to help

We finally replaced the stack of 4 MS390s with 2 MS425 as the core and 2 MS355s for the servers (each pair as a stack)  - No problems since - crossing my fingers. Traffic issues and other strange happenings have gone away.

ronnieshih75
Building a reputation

I would take an old version stable switch any day, over some new tech with buggy firmware/software that crash and burn.

 

Frequent L3 changes is not the issue as I stated above.  My setup was purely MS390, without any other types or brands of equipment.  I don't know what the root cause is, but Cisco needs to spend time to fix it first, until then, they need pull these RIGHT NOW.

misterguitar
Getting noticed

We have found once the switches are configured, if you don't make L3 changes, they are stable. The issue only affects the management plane. The data plane just keeps switching as it should and clients end users never notice anything. We also found that the problem crops up more on "busy" switches. I .E. ones with all the ports filled and in use, or very large stacks. becasue we want newly purchased hardware to have the longest possible service life, we decided to be patient. Going to 355's may bring stability in the management console now, but the switch lifespan will be shorter as it will be EOL before a MS390. As long as it is not impacting the customer clients, they are ok with it not appearing in the dashboard for a short time.. Obviously everyone else may have a different use case or different customers so you may choose a different path. In talking to the head of development , they are working nonstop on this and it has been given total priority over all else. As in new feature development has been halted until; these bugs are squashed for good. If they deliver on this, we will continue to use them. If it drags on for month son end or never happens obviously it may all get rethought by my management.

cmr
Kind of a big deal
Kind of a big deal

@misterguitar I don't have any inside information, but why do you think the MS390s will outlast the MS355s (as a model, not a concept)?  The Cisco 9300 hardware was launched in June 2017 and that is what the MS390 is based on, whereas the MS355 was launched in November 2018.

 

You may well be right, but I wouldn't base a purchase decision on the possible longevity of a particular model without it having a prescribed EOL already.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
misterguitar
Getting noticed

We heard that future hardware will all be the Cisco platform stuff. Not the old Broadcom based stuff Meraki used to use.But you make a good point. The EOL on the 390 on 9300 hardware may be EOL sooner. I am making assumptions based on the best information I have.

ronnieshih75
Building a reputation

For the record, once again, I configured these stacks within 2 days then configuration was static from then on with no more L3 changes.  Problem started about a month after the finished installation.  Then it was non-stop weekly firmware upgrade, it was a complete joke trying to explain to management why I was at the call center every Friday night, it basically ruined my weekends for the summer months in 2021.

 

I don't see a point in defending a product that brought down our call center about 5 times in the middle of the day.  Even the $300 Cisco Small Business switches off ebay are better than these.

SPieroni
Here to help

Has this been resolved by upgrading the firmware to 14.33.1?  Meraki is advising us that this is our next step.  We have 12 MS390's that are used as core switches for several of our sites.  Every couple of months they all start dropping off.  Once one drops it is guaranteed that the rest are not far behind.  I pull the plugs and in another couple of months the cycle starts all over again.  Now it has happened again which is how I found this thread.

misterguitar
Getting noticed

Yes. It does fix the dropping off the dashboard part. It has not however fixed the issue of layer 3 changes requiring switch reboots for routing to work correctly.

SPieroni
Here to help

That's good to hear.  It was getting pretty old.  Thanks for the heads up on L3 changes!

ronnieshih75
Building a reputation

Wow, still not fully fixed, after almost a year!!  

 

You keep us updated here.  Read what I wrote above, I upgraded firmware almost weekly for 3 months straight until the day I finished dismounting all MS390 switches.

Korey
Meraki Employee
Meraki Employee

Hello everyone, the MS Product Management team wanted to provide a direct update regarding stability issues some of you have reported on the MS390 platform. 

 

We want to start by acknowledging that a higher than usual number of customers have experienced software quality issues with their MS390 switches recently. These issues revolve around instability in the Meraki management plane that can result in sporadic or complete loss of connectivity to Dashboard, until the switch or stack is rebooted. While this typically does not impact traffic, know that  at Meraki, any product quality issue of this kind is not acceptable; we always strive to delight our customers through reliable and dependable technology. 

 

The entire MS product and engineering organization understands the challenges customers impacted by this problem are facing, and have made it our top priority to execute on a plan that restores high quality, reliable software on this platform. While we work towards that goal, it’s important to be transparent with our customers. Our teams have been hard at work addressing issues in recent beta firmware releases (MS15). We have already seen measurable improvements in both stability and a significant reduction in case load, but not all issues have been resolved, and this remains our top priority until every customer affected by the problem is satisfied. . 

 

Our current recommended firmware version for customers experiencing stability issues on MS390 is MS 15.14.1 and subsequent beta versions will continue to further stability improvements. We will continue to provide updates as we have them. Should you encounter any issues with your Meraki switches we recommend contacting Meraki support. We genuinely appreciate your patience, and will continue to strive for the excellence and stability that our customers have come to expect from Cisco Meraki.

Rycherd
Comes here often

I am an IT Director at a school which has been Cisco for many years. Two years ago, we decided to replace aging Cisco switches and WiFi with Meraki equivalents. We trialed MS390, but at the time it was missing functionality we needed, so we deployed MS250/350. We needed to keep traditional Cisco switches at the core, to give flexibility around IPv6, complex ACLs, etc, so opted to migrate these to Cisco 9500 in our roadmap (as there are not Meraki equivalents).

 

We really like Cisco's current direction of the travel, as we will now be able to integrate our Cisco 9500 core switches into the Meraki console for single-pane-of-glass network visibility / monitoring. Converged Cisco hardware, able to run IOS or Meraki, really makes sense at lots of levels. It gives customers choice and flexibility and simplifies supply chains, hopefully lowering costs.

 

However, the bottom line here is that things will need to work far better than what has happened with MS390 which has been a "pilot" for converged Cisco/Meraki hardware. From what I can ascertain, MS390 is a rebadged 9300 running a Meraki management layer, and from discussion above doesn't seem like it has ever been properly stable.... I don't know what the problem is/was. Maybe the Meraki layer was simply "retrofitted" - so perhaps designing "ground up" in future with both usage cases in mind can yield better results. I want this to work, and certainly hope it will!

Steviespitfire
Here to help

We have been using MS390's for 2 years now, and despite many promises from Meraki have never worked properly, they are unfit for purpose, and it's a disgrace that they are still on the market. Cisco must be mortified that their logo is on the front of them. 

bmarms
Getting noticed

I understand Cisco’s reasoning for wanting to manage the Catalyst platform via the meraki dashboard. It seems the MS390 was their first attempt and, IMHO, failed miserably and should have never been offered to customers. 
hopefully they can get this right. 

daveclarkco
Conversationalist

Korey- I read this entire set of posts and I'm terrified that I bought into the MS390 switches without even blinking that any Cisco-branded product would be so problematic for so LONG.

Six years ago I implemented six 3850s in a stack (and with two powerstacks) and they've been flawless ever since. Their uptime is now over 4 years, impressive by any measure. They're just doing access layer switching, no routing, truly nothing heavy going on, and I selected the MS390 series because they gave me the 10G expandability and power stacking capabilities I believe are necessary in my environment.

I just received my four MS390s one business day ago and am tempted to not even open the box, wondering if implementing these are going to be reputation (if not career) killers.

It's been four months since your last post here (or anyone else's, for that matter). Please update us all as to Meraki's current position on how progress towards stabilizing this platform is going, and please update us on what firmware version is recommended at the moment.

Thanks in advance.

misterguitar
Getting noticed

We have a lot of these. And to be honest, they still have problems. I have a current customer still having problems. The promised fixes to the firmware fixed some issues, but not all. We have some folks using this who are not having problems though. But they are temperamental and you have to be careful what you do or you may destabilize them.

 

I would still categorize these switches as beta. But your mileage may vary.

 

If you are looking for rock solid get the older MS350's if you need the gui console. Or if you are well versed in IOS get the CAT 9000 replacements running IOS. That's my humble opinion.

 

 

daveclarkco
Conversationalist

Thanks @misterguitar that's somewhat encouraging. Interesting that I had ordered these back in June/July and was initially told they weren't going to ship until March 2023. I only found out two weeks ago that they were in fact shipping. Makes me wonder why that initial expectation was set to March, not say 2 or three months (e.g., supply chain problem).

Still awaiting some official word from anyone at Meraki.

Korey?

PhilipDAth
Kind of a big deal
Kind of a big deal

I think that is part of the problem - IOS-XE on the Cat 9000s is not "rock solid".

 

I know some large shops that have to reboot their Catalyst 9000 switches everything 3 months to prevent them from crashing.

 

So when the underlying OS is not stable, adding a Meraki shim on top just makes it even less stable.

bmarms
Getting noticed

I’d save yourself the headache and RMA them if you can. If you want cloud managed native Meraki, go with the MS355 series if you need mgig and upoe. I have both ms355 and ms250 in all of our 13 offices and they’ve both been solid platforms. 

MRCUR
Kind of a big deal

Run away and don't look back. 

MRCUR | CMNO #12
StevenEarl
Here to help

I know they are wanting to do more integrations between platforms which is good in theory, but they had better test the heck out of it first. I have been using all of the "standard" Meraki equipment and really like it. Almost every company has had a device/software with an issue -  the key is do they learn from it.

ronnieshih75
Building a reputation

This is unbelievable, more than a year after I made this post, people are still discussing MS390's instability even on the newest beta firmware.

Steviespitfire
Here to help

Have to be honest, since applying the latest "stable" release firmware we have had no further drop outs, been about 2 months now - fingers crossed.

misterguitar
Getting noticed

Our experience has varied. I have one customer still having stability issues. Especially with large stacks. The issues aren't it dropping off the management console anymore, but more around L3 changes. Or in one case they had to RMA a stack member for hardware failure, and following Meraki tech support instructions to replace the RMA's unit, it blew up the whole stack. It was a stack with 6 members. In areas with smaller stacks of 2-3 390's they seem much more stable. I do know we were told the new 15 code would be out months and months ago, and that has yet to happen. It seems to be taking Cisco a lot longer to fix this than they thought it would. Not sure if it is becasue of continuing stability problems or what. They are pretty tight lipped.

Korey
Meraki Employee
Meraki Employee

Hello everyone, thank you all for your contributions to this open thread. We appreciate your continued feedback, and this remains our #1 priority.

 

While we have made steady and promising progress, our work is not complete.  We plan to have a Switch-15 Release Candidate version available within the next few months and this version will build on the improvements we made in 15.14.2 and additional fixes and enhancements. We highly recommend that MS390 users upgrade to this version as soon as it is available. It will be the highest-quality release yet for the product and we expect it will get promoted to GA soon after.

 

As always, should you encounter any issues with your Meraki MS390s, please reach out to Meraki support for assistance. We appreciate your patience as we work towards the Meraki standard of excellence our customers expect.

Bur20
Here to help

Those of you still running MS390's how are they running on the latest 15.18 RC firmware? Ours have been decommissioned for over a year now and we are hesitant to throw them back into our core until we hear from others that they are stable. Are they functioning as advertised? Any random drops/reboots? Any routing issues? Dropping from the dashboard?

misterguitar
Getting noticed

Some of the ones I deal with are still having issues. STP issues, stack management issues, general instability issues. Until 15 hits and has some "skin" on it, I would wait. I have a few people ready to bail on the 390's for Catalyst again because you can manage Catalyst switches with the Meraki Dashboard now. Some are discussing writing them off as losses.

Steviespitfire
Here to help

We upgraded to 14.33.1 back in September, since then (touch wood) we have had no issues.

misterguitar
Getting noticed

     

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels