We have a new deployment of Meraki switches (MS350) and APs (MR33). For the most part things are working very well. However, we have had instances where the AP's will stop passing traffic for one or more clients. Sometimes this issue is isolated where only one client will experience it but we have also had instances where the issue was widespread where a dozen or so clients across multiple AP's (on different floors) will all at the same time lose network connectivity. When this happens the client cant even ping the DG (which is on the meraki switch). The client will usually have to disconnect and the reconnect to the WLAN to get traffic going again, otherwise the cklient will usually correct itself after a minute or 2 and be able to pass traffic again. When the issue occurs the logs only state that the client "disassociated" for "Unspecified reason" but doesnt offer any more granular detail as to why this is happening. Has anyone ever seen this before?
@craddockc Are the APs running the latest firmware?
When the client loses connectivity, they will still be connected to the AP but they can't ping their DG. Is this correct or the clients get disconnected from the AP too?
Also, did you check for any interference in that area, what does RF spectrum say about utilization?
The client appears to still be connected (associated) to the AP, it just loses its network connectivity even to the point to where it cannot even ping the DG which is the switch directly connected to the AP. Its like the AP just stops passing traffic for the affected client. I considered channel utilization as a possible cause but to me that would make more sense if it was just one AP. The issue has occurred to multiple clients across multiple different APs (on different floors of the building) all at the same time.
We utilize the band steering option so most clients use the 5Ghz range where the Dashboard shows "Low" and "very low" channel utilization.
You mentioned interference, how can I find out if that is whats contributing to the issue? We have 2 separate SSID's one for corp devices that use 802.1x and another for BYOD devices that use WPA2 PSK. I am noticing that on each AP, both of the SSID's are using the same 5Ghz channel. I wonder if the BYOD SSID is interfering with the Corp SSID as it is the corp devices that we are noticing the issue with.
Might take a look at the below doc to help you see if there are any issues with interference.
Thank you everyone for your valuable input. I am goign to move our BYOD WLAN to 2.4Ghz only and see if that help alleviate the situation. Your suggestions have me thinking that its possible that the BYOD SSID is occasionally interfering with the Corp SSID, causing Corp clients to be affected. I will make this change and follow up in a few weeks with results!
>The client appears to still be connected (associated) to the AP, it just loses its network connectivity
I have seen issues like this before, especially with Microsoft Surface Pro's. The Marvel WiFi chipset is particularly prone to it (which is used in Surface Pro's).
The Surface Pro's are a particular pain. They have a bug where they incorrectly cache the pair-wise secret key when they momentairly go out and back into coverage. Even when the AP sends them a request to clear the key they don't. Microsoft have never released a fix.
So what happens is they say they are connected, but they encrypt the packets with the wrong key. So no data can be exchanged. Often you have to disable/enable the WiFi adaptor to make them work again.
When it happens to you does disabling/enabling the WiFi adaptor make it work again?
So if you have a network with Surface Pros or machines using the Marvell chipset usually the only thing you can do is to increase the density of APs (or replace the machines - my prefered option).
So I would suggest by start by looking at the clients drivers. See if their are any updates. So often this issue is related to client side drivers.
If you have the coverage overlap on 5ghz band, I would moved the Corp SSiD to 5ghz band only. The BYOD/Guest on 2.4ghz only will ensure that there is no channel congestion from non work related clients.
You can also cap the bandwidth for each client (leave the ssid limit to unlimited)on the Corp network, say 40mbit/s or similar per client, this will ensure that if one client is doing, for example, a windows update, if will not impact too much on clients on the same channel/ap. Do this in Firewall/ & Traffic shaping on the SSiD
craddockc , There is currently a bug in Apple devices like iphones and Ipads when using WPA2 PSK, that may be related to your problem. Please check if your clients are using iOS devices.
Event log display the dropout like this:
Jun 17 18:11:40 MR33 IPA iPad Jorge 802.11 association channel: 11, rssi: 23
Jun 17 18:08:27 MR33 IPA iPad Jorge 802.11 disassociation client has left AP
Jun 17 18:08:27 MR33 IPA iPad Jorge WPA deauthentication radio: 0, vap: 0, client_mac: 48:3B:38:9F:A8:3B more »
Jun 17 18:08:23 MR33 IPA iPad Jorge WPA authentication
Jun 17 18:08:23 MR33 IPA iPad Jorge 802.11 association channel: 11, rssi: 27
Jun 17 18:03:13 MR33 IPA iPad Jorge 802.11 disassociation client association expired
Jun 17 18:03:13 MR33 IPA iPad Jorge WPA deauthentication radio: 0, vap: 0, client_mac: 48:3B:3
The issue we identified in this case that can cause intermittent failures to connect to a PSK Wi-Fi network has been resolved in the iOS 13 beta.
That's very interesting.
Have you any links to further information around this issue with the Marvell Chipset/Surface Pro and the pairwise (master?) key (PMK)?
I have run into this issue more with Cisco Enterprise WLC controllers. I have spent countless hours working with Cisco TAC going through packet debugs to confirm the issue. It affects all AP vendors.
However if you Google subjects like "surface pro wifi issues" or "marvel wifi issues" you should get a lot of posts. It is a well known issue.
Please tell you you don't have Surface Pro's or machines with the Marvel chipset. Please.
@PhilipDAthThanks Philip for the extra info.
I'm very aware, and have been for at least a couple of years, of the number of reported wifi issues generally with the Marvell chipset in the SP's, it's certainly not very well regarded to say the least (putting it politely)
Sadly we do still have a large number of SP's, with, of course the Marvell chipset, and do experience some issues, as you describe, when clients roam between AP's (although we do have good overlap of coverage).
Thanks for the advice
I have moved the corporate WiFi that was having an issue to the 5Ghz band only. This seemed to help for awhile but just recently we had users once again complain of their traffic no longer being passed. One user it turns out was a driver issue. The other users we still have not yet figured it out. The issue only happens intermittently. The RF Spectrum in the 5Ghz band on all of our AP's shows low to very low channel utilization and almost no interference on the 5Ghz band, so I am not convinced it is channel utilization or interference at this point.
Out of interest, are you using the 26.x firmware, we had been using it but about once a week we had a whole heap of clients stop talking for a few minutes and then some not coming back until AP reboots. Downgrading to 25.x fixed it.
This issue occurred again today. After troubleshooting I found that the AP's were not processing/forwarding the ARP packets correctly for the affected clients. Long story short, ARP was failing for the affected clients. I still cant explain exactly why this is happening but my suspicion is that its the AP Firmware. I am going to upgrade the firmware from MR 25.13 to MR 25.14 and see if that "corrects" the issue.
Also having a similar-sounding issue here - for at least all year. MR18's stuck on 25.13 (not sure if there will be a newer F/W?). We have 14 APs that would be a hard-sell to upgrade hardware if it's a bug they'll not fix - MR18 are older, but I wouldn't say "obsolete".
Lowering power on all APs didn't help. It seems like when it happens, one laptop will maybe cause it and then random other clients on that AP will also drop... but not all of them. This is so frustrating!