MR33's going offline

jameshottinger
Here to help

MR33's going offline

Hi,

 

has anyone ever experienced a particular model of AP mixed with other MR's going offline?

 

We have a situation where 15x MR33's, 9x MR42's & 1x MR74 are deployed mixed in 2 buildings, with MS225's as their PoE source. Only the AP's are patched into the MS225's that have a fibre up to the edge switches.

 

Its completely random when it happens, but all of the 15 MR33's & 1 MR74 in both buildings go offline, but PoE is still being provided. Packet captures show no traffic at all. MR42's remain online & functional.

 

Firmware is all up to date on AP's & switches.

 

Interested to know if anyone else has had this before?

 

thanks

James

21 REPLIES 21
MangiaNYC
New here

Hi,

I had an MR18 that went offline for no reason.

Replacing power adapter with POE didn't help. 

My last attempt 2 weeks ago seemed to be the solution.

Check the ethernet cable for loose wires (http://www.homedepot.com/p/Klein-Tools-VDV-LAN-Scout-Jr-Tester-VDV526-052/202520420) and push in the cable on the switch and on the MR as well. To ensure a good connection. 

This may not be the solution for you but you can try.

PacketHauler
Here to help

What version of firmware are you running? There was a recent beta release that I thought addressed random reboots of that particular model.

 

Release notes for 25.8:

Bug fixes
  • Provisioning service in some cases would not be able to bring up BLE radio (MR33/MR30H/MR74)
  • Some clients and APs were reported with high RSSIs (255dB) which were inaccurate (MR42/MR52/MR53/MR84)
  • Port 0 was being assigned to an Aggregation Port which violated 802.1ax-2014 section 6.3.4 (MR52/MR53/MR84)
  • Tuned MCS EVM to improve rate vs range (MR42)
  • Provisioning service would cause scanning radio to get stuck (MR30H/MR33/MR42/MR52/MR53/MR84/MR74)
  • Stale information fix ups for Client Balancing
  • Ethernet phy configuration issue caused CRCs on upstream switch ports (MR33/MR30H/MR74)
  • Stability issue caused MR to reboot under load'
  • Low level stability issue caused AP to reboot
________________________________________________
[root@allevil ~]$

@PacketHauler - i am actually currently on 25.6, i'll upgrade them to 25.8 right now. Thanks for pointing that out, hopefully that resolves it!

Is there a way to check when an AP has rebooted ?  I have checked event logs in dashboard for an AP and it shows no power off/on or reboot when I power off or reboot an AP, or can I find any 'uptime' value to signify a reboot

 

Thanks

rsweet
Conversationalist

You can try checking the event logs for the switch port the AP connects to but it isn't very useful. I would recommend calling support. They can login and actually view the device uptime and reboot reason/code. 

MerakiDave
Meraki Employee
Meraki Employee

Correct on calling Support to provide AP reboot timestamps and reason codes which may not be in the event log.  @jameshottinger I'd also get with Support to help investigate, while the first thing to troubleshoot is the possible layer 1 issues, this certainly doesn't sound like your typical L1 issue, because you have a mix of 33s, 42s and a 74 and only the 42s stay online consistently.  The chances are pretty much zero that those exact 16 switch ports would have a common L1 issue while the 9 ports connecting the 42s do not.  One possible exception would be in PoE overload situations where a switch can methodically shut down ports for load shedding.  But that's also not happening here as you indicated, they are not losing power, just falling offline.  And it wouldn't seem to be an upstream issue because 1) the MR42s stay solidly online and 2) it randomly occurs when nothing upstream in the environment has changed.  Support can also scrub your firmware versions on the APs and the MS225, assist running pcaps, as well as see some lower level troubleshooting info that might shed light on the root cause.   

jlschear
Here to help

I have the same issue hapening in my environment with my MR33s.  Most of the time it seems to only affect ~70 MR33s in my environment (Two different Networks which are in two separate buildings), some times it is all of them (about 100 in 5 different networks).  None of the other Meraki AP models in my network are affected by whatever is going on.  My largest grouping of MR33s is mixed with MR32s as well and they have not been affected.  I am running 25.7 on some of the APs and have upgraded some of them to 25.8.  Meraki and my Cisco team have finally admitted to me that they believe that this as a "bug" but have no idea what is causing this and therefore have no fix at this moment.  They have further confirmed that while not all customers are having this issue, there are enough that the Product team and their engineers are working to identify the cause and fix the issue.  The only thing I can do at this point is reboot the APs by power cycle when they go offline.

Same problem here - 23 MR33s all going offline at the same time.  Our MR42s sharing the same subnet and PoE switches are unaffected.

 

Hoping that support sort this quickly - All or our teachers rely on the wireless.

Back up and running - Just waiting for Meraki Support to let me know how they fixed it.

TomasN
Conversationalist

Hi,

Me too ..


I have created the case to Meraki, my symptoms are:

 

There is 21 MR18, mostly powered by AC adapter or PoE injector.
Access points are connected to customer's LAN (Cisco catalyst switches).
Switch ports for Access points are configured as 802.1Q trunk with native VLAN which is management VLAN for AP.


Since 11/2017 all the acess points are randomly (not all of them in the same time) getting in stuck - not serving the clients (but SSIDs are broadcasted), they are not bridging the traffic nor replying to traffic directed to them.
From the Dashboard point of view, the access point appeared like "unreachable" (red point in GUI).
I checked all the interconnections between switches, logs, configuration on customer's LAN switches and Gateway, but i cannot find any problem.
The only way how to wake up these faulty APs is to shut/no shut the switchport it is connected to or to reboot the AP by power adapter disconnect.

 

I have one Access point which remain in this failure state.

On Gateway, I am correctly seeing the arp entry for this AP.
On the switch, where the Acces spoint is connected to LAN, the MAC address of AP is correctly learned.

But the AP is not reponding to ping from GW..

In GUI, the AP is pointed with RED and the status is Unreachable.

 

Tomas

MAG
Here to help

HI All,

I recently had that same issue: every AP and Switch suffered a sudden cut, like a reboot , all at the same time....

Long story short...i have been told that there was an issue on the Meraki backend server that was "controlling" our network ...no further explanation from Meraki support ....after 3 weeks that issue just become less frequent and then gone.

 

Best

Miguel

We are still experiencing the failures in our environment. This still only pertains to mr33 wireless access points for me. No other devices are being affected. Meraki now wants me to rma one of my devices as well as give them a copy of my switch configuration. I will let everyone know if there is any change on our side.
Thanks,
Justin

Hi Justin,

Those "cuts" are  on all your MR33 at the same time and on all of them ? at least on all of them on a specific network under the dashboard?

best

Miguel Angel

 

MAG,

I have 103 MR33s across 6 networks in my dashboard. Not all of them experience the issue every time.  Most of the time it seems as though Network 1 and Network 2 will fail and all of the others will stay up.  Sometimes it is all of them.  Network 1 currently has a mix of devices (MR32, MR33) and the only ones that always fail are the MR33s. 

Hope that answers your question.

Thanks,

Justin

Hi Justin,

 

it does, Actually I had also a similar issue last week with two MR52...intermitent cuts , not at the same time but identical behaviour.

After checking the cable, Poe and factory reset I went for a RMA.

Since your case seems to be  model based and much more critical given the amount of AP affected I would quickly contact Support and ask for a RMA after confirming  that cable and PoE (Have you tried to plug any of those faulty AP to another port/switch with external power adaptor ? ) are ok and  factory reset  is not solving the problem.

To me it seems something out of your control since it is quite random , very likely  related with any meraki backend issue.

Hope it helps.

 

TomasN
Conversationalist

Just my actual state:

I provided to my TACman mirror of traffic from APs switch port during the failure. You can see there, that the AP is doing again and again arp requests for MAC address of its own gateway and is getting always response, so on the GW, there is correct record in ARP table. In the same time when I tried to PING from Gateway, you can see, that the ICMP request is delivered to AP but not answered by AP. On the GW, there is no nat translation for this AP, so the AP is not communicating outside from LAN. This behavior is till I shut/noshut AP's switch port (AP is powered externally by power adapter, so the power for AP is still there). Whole time during failure state, the AP is not serving clients on another VLAN (SSID) and the status LED is solid BLUE.

 

I think, that it is failure of AP not failure of dashboard (Meraki cloud controller) because AP should work also in case, when dashboard is unreachable.

 

Unfortunately my TACman is quite newbie or looser, because everything which was done by him is this statement: "During any type of fault AP will behave in unpredictable way" .

I sent him this link and link to another forum, where this problem is discussed, but you can take a guess what happened..

 

He asked me if I have still some APs in failure state (the last one I had was the one I used to capture the traffic during failure and during reconnect) and then He closed my case..

 

Really useful and valuable support provided by Meraki 😞 , I hope this is only my TACman's person mistake..

 

 

 

Hi all,

 

sorry, i haven't kept you posted on our developments....... that's mainly because we haven't had any developments!

 

We managed to capture some wireshark data via a port mirror on one of the affected AP's that Meraki are now in possession of. From what i could see everything was running absolutely fine then stops dead. I am told this is with the Engineering Team. That's the most i have been told, and keep being told.

 

I've been off for a few days so i haven't applied any pressure, lets see what i can squeeze out of them today!

 

Cheers

J

MPMHelpdesk
Conversationalist

Hi,

We have the same situation here, but we have a mix of MR32's, MR72's and MR74's, the only model presenting that offline issue is the MR74 model, same as other posts, they become unreachable from the portal or clients, only workaround now is turning the switch port off and on so they can work again.

We currently have 17 units with the issue, Meraki RMA us 3 units to test, but as soon as we connected them, they went offline after 1 day of operation together with the rest of them. We now have tickets open with Cisco (We use Catalyst 6800's switches) we just finished a support call this morning, they went through our configuration and everything is ok according to them, also Meraki support ticket for reviewing the Meraki configuration, which they also say is correct, so at this point we are bouncing back and forth while our production is getting affected and IT yelled at for not fixing the issue 😞

Please correct me if I am wrong, didn't the MR74 model just replace the MR72 series?  The MR33 series is relatively new on the market replacing the MR32.  I am just wondering if there is something in these "New" models that does not play nice.

Please correct me if I am wrong, didn't the MR74 model just replace the MR72 series?  The MR33 series is relatively new on the market replacing the MR32.  I am just wondering if there is something in these "New" models that does not play nice.

That is correct, actually MR72 is no longer advertised in Meraki page, we have been wondering the same question, specially after Meraki support told us that this was a known issue and that the engineering team was trying to figure out how to correct it, we have updated firmware version as suggested and still going offline. We wanted to share our current experience so anyone considering upgrading now think about it twice, or wait until a fix comes out. And see if anybody has been able to fix it...

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels