MS 11.22 killed my call center

Solved
Brons2
Building a reputation

MS 11.22 killed my call center

So some of you may have read the prior thread where I was grousing about the MS120s not passing OSPF.

https://community.meraki.com/t5/Switching/MS-120-won-t-even-pass-OSPF-traffic-unmolested/m-p/66401#M...

 

In the process of troubleshooting the MS120s not passing OSPF on Saturday morning, I thought I would do a staged upgrade to where the MS120 would take the latest available firmware, and maybe that would fix my problem of OSPF not passing between two adjacent ports on the same VLAN.

 

However, when I scheduled the staged upgrade, it upgraded all my switches at the site I was working at.  I had been holding off on the 11.x branch, was still on 10.45.  But I let it go, because, I just wanted to get my routing changes done and go enjoy my weekend.

 

BIG MISTAKE.

 

Got to the office before 7:30 Monday to see how things were running.  I got to my desk to find the PC techs were scrambling around to re-plug IP phones that weren't registering.  I just chalked it off to the firmware upgrade and maybe the phones came up before the route to the phone server did.

 

Then my session border controller went offline.  We use a cloud IVR and contact center, so this is a huge deal.  People started calling downstairs from the call center.  I could not even ping the session border controller.  I thought, well, maybe I bumped the cable when I was working in there on Saturday, so I went in the IDF and replugged it.  I was then able to ping the SBC.

 

Then the SBC went offline again.  People started calling again from the call center saying they were in the middle of a call and all a sudden they could not hear the customer any longer.  I tried to ping the SBC, dead again.  I replaced the cable. 

 

The SBC only stayed up for 10-15 minutes until I had to re-cycle the port again.  And the IP phones kept dropping off, too.  The PC techs were frantically running around replugging them.

 

As this point I started to surmise that it was the Meraki firmware upgrade that was causing it.  Went and talked to executive management, said I wanted to roll back the firmware at lunchtime.  Told them everyone would be down for 15-20 minutes.  They agreed to it.

 

Went back to my desk to schedule the rollback.  Got an error.  The portal would not let me roll back, it said a rollback was already scheduled.  !@#$%#%@^$&.

 

Exact error message was

Error rolling back. No changes have been made to your staged upgrades.
Network already scheduled for rollback

 

At this point called Meraki support.  He didn't seem to understand at first what was happening, but after starting a remote session he could clearly see the error.  Turns out, I needed to bring ALL my meraki switches online and up to 11.22 before I could roll back.  I had some older MS42Ps that I had replaced with MS250-48FPs last fiscal year.  Well I had to go find them and plug them back in so they could upgrade to 11.22.  Then I still could not roll back.  So the Meraki engineer took over and rolled them back from his end.  But the whole process took way longer than the lunch hour, so there was an outage for the business after 1pm of about 20 minutes.

 

After the rollback to 10.45 everything worked.

 

I asked Meraki Support to please clarify why my staged upgrade did not work.  They said they'd ask the engineers 🙄

 

So now I'm debriefing on the outage, and looking at release notes for various version.  I came across this for 11.24:

 

Bug fixes

  • ARP entry on L3 switch expires despite still being in use -- causing brief packet loss until re-ARP occurs

 

If this is the case, WHY DID THE 11.22 FIRMWARE GET SCHEDULED AS THE NEXT VERSION?!?!?!?  FOR ANY CUSTOMER?!?!?!?

 

While we're at it, I need a manual reversion process.  The web upgrades are great and all, but if I need to roll back, I can't be having to go through support to do it for me.

 

In closing,

Shame on you Meraki.  You have really shaken my confidence.  I have other choices for network equipment.  The budget has not had a final decision for this fiscal year yet (Sept 1-Aug 31).  I would like to be able to continue this relationship, but, you really need to change some things.

1 Accepted Solution
RodrigoC
Meraki Employee
Meraki Employee

Yikes... I'm so sorry this happened to your network, @Brons2. This definitely requires some attention.

 

Could you please DM me the details of your case; I would like to personally follow up on the matter. 

View solution in original post

2 Replies 2
RodrigoC
Meraki Employee
Meraki Employee

Yikes... I'm so sorry this happened to your network, @Brons2. This definitely requires some attention.

 

Could you please DM me the details of your case; I would like to personally follow up on the matter. 

PhilipDAth
Kind of a big deal
Kind of a big deal

That is a crap experience!  I'd be pretty upset to.

 

None of this helps you, but just some of my personal experience.

I used to use staged upgrades all of the time.  Now I don't use them at all.  Everything seems to work fine upgrading all of the switches at once.

I had a customer with an MS125 and they were on 11.22.  We had just one of their MS125's (they have 7) reboot every 4 hours (closer to 4 hours and 10 minutes).  I don't know why.  Support didn't know why.  I was on the verge of doing an RMA but decided to try 11.28.  In my case, there have been no more reboots since then.

 

You've had a bad experience with 11.22.  So I have I.  I "down vote" 11.22 as a stable image.  I think it has serious issues.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels