Hi. We're having issues with clients disconnecting from the SSIDs. This is affecting all sorts of clients - Windows, Macs, Chromebooks, IOS devices. 802.11r is disabled, we have band steering enabled, utilisation is low and interference is low.
Cisco suggested changing the power setting to variable from 100%. We have also upgraded the firmware to the latest version. Nothing major stands out from the DFS events. I have also checked the DHCP and DNS event logs on our domain controller but again, nothing of any significance there.
The APs are MR32s and they have been in place for over 3 years and haven't been a problem up until now. We haven't made any major network changes and there haven't been any environmental changes in the building either.
We have increased the minimum bit rate to 12mbps but we're having no joy. This is happening when connected to several APs, but we have had issues with clients that are connected to one particular AP.
I have checked out the Meraki logs and the disassociation is because of an 'unknown reason'.
For some users, they have to manually re-connect to the network, whereas others it has reconnected itself after a few seconds.
I don't believe it's anything physical but I'm lost as to what else it can be. We have had an external contact look over our config and they can't see anything wrong with it. I'm just out of ideas so any help would be appreciated!
Hi nealgs, thanks for the response. We've had complaints about 2 SSIDs (which were setup before my time). Our guest SSID less so because it's not used as much.
Have you restarted these APs yet? If not, I would. Did you have a firmware change recently on the APs?
I'm grimacing at the idea of cranking all of your radios up to 100% broadcasting. For reasons of politeness and better performance, you typically want the radios negotiating their broadcast volumes on their own. More signal is very rarely the correct answer, especially in a system that had been functioning up until now.
Agreeing with @Nash please reboot all AP's whenever there's downtime either by one offs starting with affected APs. See if this fixes the issue. It might be something wacky after the update. 👾
Hi Nash. We did a firmware upgrade last Thursday, and the AP power settings were set to negotiate their broadcast volumes. I thought it maybe because of a client-side issue but I've ruled that out because of the variety of devices affected.Also checked the access switches to which the APs are connected to see if there are any interface issues but nothing there either.
Was this happening before the firmware upgrade ? If not, try a roll back.
If it was, is it happening on both 2.4ghz and 5ghz band on the same SSiD ? Check the RF stats for utilisation, number of clients etc on the worst affected AP, has this increased in the last 30 days ?
Any changes to DHCP scopes ? Are these bridged or NAT clients (or both) ?
Any potential new sources of interference ? Check air marshall for spoofs, rogues ap's etc
Check channel utilisation
Any upgrades to the client devices, NIC updates etc ?
@GJ1 wrote " We did a firmware upgrade last Thursday "
ROLL BACK FIRMWARE
The bulk of our AP's are MR32's and 25.13 is solid for us
Hi pjc. It was happening before the firmware upgrade. The clients are bridged to the LAN and there hasn't been any changes to the DHCP scopes. There are 2 rogue SSIDs that are both contained, and channel utilisation is low. There haven't been any upgrades to the clients and there isn't an increase on the number of clients in the last 30 days.
Interference levels aren't very high; I will need to go back and check on which band this is happening on most. It's been suggested to me to use the DNS servers of our ISP rather than those of Google as they're set to now, but they've always been set to use Google's. I will restart each AP this evening and see how I go...
I can't see a reason why Google's DNS would be an issue, and it's a bad idea to start making a lot of changes at once.
Start with the AP restart and some reconnaissance.
If you look at the logs, do you see a lot of client dissociation/reassociations for specific clients reporting issues? Can you run any packet captures when a client's seeing issues, from AP to client and - ideally - on the client itself? I know that's a pain with an intermittent issue.
Yes, take your time, get back to the same config prior to the issues, don't wildly make changes like DNS etc.
Take a breath and as @Nash says gather as much reconnaissance as you can to see if there are any similarities with clients, access points, radio band, time of day, client type ect for those working and those having issues
Once you have narrowed it down it'll be easier to diagnose. Even though you said the issue started before you upgraded the firmware, if you have been running 25.13 for many months (that version is nearly 12 mths old) in your enviroment without problems, at least by going back to that will ensure that there are no new variables to worry about
Hi pjc. I've been looking at our core switch at the our office and the core switch at one of our other sites that is having no issues. I see that on the interface connecting to the firewall at our office there are a high number of output errors, whereas our other site has none at all. I don't want to say I have found the issue but it seems plausible that this is what's causing it.
@GJ1 Good work, if thats the case then it's not Meraki related. Get back to basics, get some pings going to the IP's of the affected access points from both sides of your firewall (internal lan and external sources), if you are seeing drop outs then it's no wireless/radio/client issue. Repeat the pings to hardwired devices at the site, printers, servers etc. If you narrow it down to a general network/firewall issue check things like switch port duplex/speed mismatches, broadcast storms (loops on network), routing etc
We're having the same issues. This started happening for us a couple months ago after we went to the 25.14 SRC firmware. We reverted back to the 25.13 firmware, but unfortunately that didn't bring things back to normal. I actually just tried installing the 25.14 firmware now that it's stable release, and things are just as bad. I actually just put a ticket in with support regarding the issue, but haven't heard back from them yet. We have MR42s for a couple years now here as well, and they've worked great up until we installed this firmware update.
Hi EricEJ. For us I think the issue is related to the fact that we're seeing a lot of packet output errors on our core switch interface that connects to the firewall. We've done WiFi troubleshooting with support and there was nothing that stood out as being wrong with our config. We also had the issue before the firmware upgrade (I thought upgrading the FW would have helped, but no).
Just thought I'd share this pointer. Good luck!
Have you created a case with support?
I am happy to take a look.
Also, try to take a packet capture and check if you see a lot of deauths, also check air marshal, you might be being contained.
Let me know
I have a case that been opened for around 2 months with the similar problem, 04247935
I tried firmware 25.13; 25.14; 26.5; 26.6 but no luck
Also, I just discovered interesting things:
- Client A (Vlan x) connects to AP-1: Everything works well
- Client A roams to AP-2: Traffic drop, cannot ping to vlan gateway either.
- When Client A turn off then turn on WIFI OR New Client B (same VLAN x with client A) connects to AP-2 : Client A start working again
Our SSID is configured with Bridge Mode, Radius overwrite VLAN.
Layer 2 and Layer3 are Meraki switch stack.
>Client A roams to AP-2: Traffic drop, cannot ping to vlan gateway either.
I've seen behaviour like this before when minimum requirements are specified for the WiFi SSID or network.
For example, lets say you have set a minimum connect speed of 24Mb/s. Client 'A' decides to roam to AP2. They only manage to get an 12Mb/s connection so AP2 rejects the connection. The client does not process the rejection as a hint to move to another AP. The client errornously reports it is connect to AP2 when it is not. Client looses all ability to send traffic, because they are in fact not connected.
Another time this can happen is if someone has configured RX-SOP too aggressively. Client can see the AP but because of RX-SOP the AP ignores the client. No traffic passes.
Another common problem is with machines using the Avastar chipset which had a bug handling the encryption key. If the client goes out and back into coverage briefly and the AP rotates the security key it sends a message to the client to delete the existing security key and negotiate a new one. Alas thew Avastar chipsets fail to process these messages. So the client reports it is connected but it encrypts the WPA2 data using the wrong key. When it gets to the AP it can't decrypt the data.
Consequently the client can not pass any traffic. You have to physically disable/enable the NIC on clients with this chipset to get them to pass data again.
Thank for your ideas..
The problem happens even when we roams between 2 APs near by. I set minimum connect speed of 18Mb/s and the client get 75+Mbps speed test to that AP (after it reconnected). We did not config RX-SOP.
The clients are Mac Air (2017 and 2018), some Windows computers....
It's strange as the client starts working automatically if a new client join into the AP
I was trying to find a reference, but I can't at the moment. 12Mb/s and 24Mb/s and mandatory rates that all WiFi NICs are required to support. 18Mb/s is not. So when you use a non mandatory rate as your minimum speed you are relying on the manufcaturer supporting more than the minimum requirements.
I did put a ticket in. Meraki had me adjust the radio power from 30 to 10-30. They also had me adjust the minimum bitrate from 1Mbps to 12Mbps. This seems to have helped with the majority of our computers, but we still have a few that are having issues. Still, we didn't have any issues whatsoever until the firmware update.
Have you tried testing SSID with PSK or OPEN?
Please look at your Wireless Health and check if you are having more issues in authentication.
I have not tried PSK or Open. I'm testing to enforce VLAN via Group Policy to see if it helps. In our network, the authentication issues still the same (mostly happen to BYOD).
What are the end user devices? How are you verifying the device roams from one AP to another?
Did you resolve the output errors on your switch?
I opened a case with Meraki support at the beginning, before I focused on our core switch. Then I used our third party support who have been unable to find anything wrong with the configuration. Trouble is none of the support people (again third party) can pinpoint the issue either at the Meraki side or on the core switch, or on the firewall side. My next thought is to swap out the AP near to where most of the complaints come from and see whether that makes any difference.
Speaking to a colleague earlier, about connecting with their iPhone,they were able to connect to the network but not long after that they were disconnected and they then manually connected successfully to the other SSID.
Hi rowell. Nobody can tell me why we're getting so many output errors. The end user devices are a mixture of Chromebooks, Android/iPhones, Windows machines and Macbooks. The DHCP scope is big enough, the link isn't saturated. The roaming was verified by our third party support when looking at the event logs. At the moment we've hit a brick wall. We've adjusted power settings, changed the minimum bit rate, implemented band steering, upgraded the firmware. Naturally it's very frustrating!
@GJ1 Don't mean to sound annoying but iPhones were having wifi issues with an update that when they leave an AP area's connection "Auto Login" is disabled so if they leave an area they will be d/ced just an fyi.
@kYutobi Yeah we know about that issue. One possible reason we were told was it could be that because our APs are quite close to each other, there might be contention between them which is causing the devices to drop off. How plausible that is I don't know.
Just wondering, do you have "client balancing" enabled?
This can be found on the rf profile.
On event logs, look for rejected due to load balance or something like that, let us know what happens with the new ap