I'm trying to help administrate an office that has a single MR53E in it providing its Wi-Fi network. The various SSIDs use WPA2, are dual band 2.4 / 5 GHz, are setup to allow 802.11g connections or later, and are configured to be in bridge mode so that our MX84 can manage the network.
Most of the time client devices are able to connect to this network without issue. However, every now and then a client will lose their connectivity, and when checking available Wi-Fi networks will see that our Wi-Fi network is not available. Then some minutes later it will come back and their device will reconnect. There doesn't seem to be any rhyme or reason as to which devices experience this problem: it has happened to smart phones of various makes and models as well as several different laptops. I've also found no pattern as to when it happens -- it seems to be fairly random.
There is also a few devices that cannot see the Wi-Fi network at all. I'm fairly certain that these devices possess the necessary capabilities to connect to our network (802.11g or newer, WPA2, etc.) and yet they can't connect.
While there are a lot of competing Wi-Fi networks in the area, the MR53E is set to automatically select a channel to avoid interference so I'm not sure what I can do there to improve things. In case it's a clue, whenever I turn Wi-Fi off and back on again on any of my devices, these other Wi-Fi networks are displayed almost immediately, while the ones created by our MR53E take 5-10 seconds to show up.
I'm quite new to all of this so I'm not sure what I should do to try and get to the bottom of this issue. Any suggestions would be much appreciated!
Does Wireless Health say anything interesting?
When you say a client can't connect - do you mean a single client can't connect but others still can (and are) connected - or that no device can connect?
@NolanHerring to answer your questions:
The MR53E is connected directly to a port on the MX84. I'll check the logs about port flapping when I have the chance, but that's probably not it since when this problem occurs it only affects one device at a time rather than all of them.
The antenna are Cisco brand dipole Antenna (MA-ANT-3-B6). I believe the AP is in a good location. All the devices report strong to very strong signal strength.
I don't believe any of the devices that can't connect are 802.11b only. Not sure about DFS -- will look into that.
I haven't had a chance to really dig into the AP logs so far, but I agree that's a good place to look for issues.
Thanks for all the suggestions so far.
The issue is that a single client won't be able to connect (temporarily) but the others still can.
Wireless Health does mention some connection failures. I'm not sure how to interpret the logs, though. Most of them seem to be Association and Authentication failures. here's a small selection of them:
Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='1'
Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='3'
Association type='Association attempts' num='3' associated='false' radio='1' vap='3'
Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='2'
Association type='Association attempts' num='4' associated='false' radio='1' vap='2'
The fact that only individual clients at a time are being affected is a big one.
The log below "PA-PSK auth fail" indicates that the RADIUS server is refusing to allowing the client to connect.
You need to check the logs on your RADIUS server to determine why.
If you have a non-standard password lockout policy what can sometimes happen is when a user changes their AD password but doesn't change it on their device it triggers the account to get locked for a short period of time, resulting in the RADIUS server refusing the connection.
The RADIUS server log will hold the key to the issue.
@PhilipDAth I don't believe we're using a RADIUS server or WPA2 Enterprise. We've just got the SSIDs association requirements set to "Pre-shared key with WPA2".
Interesting. So the question is why is the auth failure happening.
What firmware version are you running on the AP?
You are not running any other brands of AP as well at the same site using the same SSID?
On the MR53E it says it's firmware is up-to-date, version MR 25.13.
The MX84 has a firmware update available that I haven't installed yet.
There are no other APs running in this office. However we used to have a consumer grade AP (not sure what model) that was providing a Wi-Fi network using the same SSID as the MR53E is now. I've since created a few other SSIDs on the MR53E just in case that was causing issues, but on any device that experiences this problem, all of the SSIDs provided by the MR53E disappear, not just the one that used to come from the older AP.
@PhilipDAth Another bit of info: the two devices that couldn't connect at all were a cheap $40 Android prepaid phone and an old iPod Touch 3. When I created an SSID that used either no encryption or WEP they were both able to connect to that, so I'm guessing that they don't use a modern enough version of WPA2 to be able to connect to our other SSIDs. Does that sound right?
(For those that might be concerned about the security of my wireless network, don't worry, I'm not going to keep an unencrypted or WEP SSID up. That was just for testing.)
After a few days of no incidents, today we had several devices suddenly become unable to see any of our SSIDs. After about half an hour one of the devices started to be able to see one of the SSIDs again, but the others still couldn't. Someone at the office rebooted both the MR53E and the MX84 and it cleared up the problem.
I'm starting to get quite concerned about this. So far this equipment is working worse than the consumer grade wireless router we were using previously! Does anyone have any other ideas as to what could be going wrong?
While this was happening, there were more "Failed connections" entries in the wireless health page for the MR53E:
Thu Jan 10, 2019, 16:28:33 client1 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0' Thu Jan 10, 2019, 16:21:03 desktop-client2 Wireless Access Point SSID2 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='2' Thu Jan 10, 2019, 16:20:23 desktop-client3 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0' Thu Jan 10, 2019, 16:20:13 android-client4 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0' Thu Jan 10, 2019, 16:16:43 client5 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0' Thu Jan 10, 2019, 16:11:23 desktop-client6 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0' Thu Jan 10, 2019, 09:29:42 client5 Wireless Access Point SSID1 Association type='Association attempts' num='3' associated='true' radio='1' vap='0' Thu Jan 10, 2019, 09:29:22 client5 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0' Thu Jan 10, 2019, 09:28:52 client5 Wireless Access Point SSID1 Authentication type='WPA-PSK auth fail' associated='false' radio='1' vap='0'
(I've changed the device names, MAC addresses, and SSID names for privacy reasons.)
I'm afraid I do not understand what those error entries mean, aside from there being some kind of authentication issue.
Is there anything else we can do to address this?
@NolanHerring I've attached two screen shots. The problem was reported to me around 16:00 on January 10, and the screen shot I took of the event log is from around that time.
So I think I figured out the issue: I had set up my SSIDs to only accept 802.11g connections and later. Apparently that causes a lot of devices to become sporadically unable to connect, even though they're not 802.11b devices.
Once our SSIDs accepted all connections including 802.11b the issue magically disappeared. I thought I'd post about it here for posterity.
Every single one of our clients is configured to use a minimum bit rate of 12Mb/s - and none of them are reporting an issue.
Hello. Has there been any kind of solution to this issue. I am seeing similar issues with authentication on all of our wireless networks. Two of them are for guest users and just use a WPA2 pre-shared key. The other is for our corporate devices and uses RADIUS to authenticate using Cisco ISE. The ISE does not report any issues.
The errors i see in the Meraki Dashboard under wireless health are the same that were reported in the original post.
Any further information would be greatly appreciated.
To answer some of the questions that other people posted after I posted my solution, our SSIDs were configured only to accept 802.11g and later connections, meaning the minimum bitrate was 54 Mbps, not 12. It could be that setting it to 12 Mbps would work fine for us, but I haven't changed it since I set the minimum back to 11. We have no reason to forbid 802.11b devices (and no one is using one anyway) so I'm not inclined to change it.
Both our router and AP are running the latest firmware and were at the time we were having a problem. The AP's channel width is set to auto. We are not using meshing.
So keep in mind, that slider is strictly for management frames (beacons/probe response/probe request etc.).
If you had the mandatory data rate set to 54Mbps then I could see how that would cause issues. I wish they would remove the option to go higher than 24Mbps personally. And even then, that should only be used for true VHD designs.
The 802.11-2012 (Section 220.127.116.11) standard mandates that devices (clients/access points) need to be able to support the following management frame data rates (6 / 12 / 24). So if your ever going to change your mandatory data rate (the rate at which your beacon frames will be sent at for example), you will want to choose one of those 3 options. You could set it to say 36 for example, but your going to potentially run into situations where clients do not like that, and have issues connecting.
Within Meraki access control settings, the slider limits you somewhat. If your SSID is 5GHz only, then you don't need to worry about 802.11b rates. If your SSID is dual-band, and you have no need to support 802.11b, then you'll want to set it to 12Mbps. Leaving 11Mbps or below will add more overhead, decreasing your potential airtime.
If your using RF Profile then you can change it per band, but basically if you have no need for 802.11b support, then kill it to improve overall performance by removing that useless airtime overhead. The 11Mbps frames take 'longer', thus reducing airtime for everyone else.
You can use this airtime calculator to see first hand how much airtime is consumed by management frames:
If your slider was set to 54Mbps, then you may have run into one of two things:
Increasing this threshold will in theory shrink the 'management cell size', but not the actual cell size. Your 'real' cell size is always the same, so changing the mandatory data rate doesn't actually decrease CCI/CCC since PLCP preamble and headers are always sent at lowest modulation data rate (6Mbps for 802.11g and 802.11a).
Safe rule of thumb is to not go higher than 12Mbps and you should be fine.
PS - I typed the word 'management' about a dozen times above, and I had to use auto-correct every single time. Am I the only one that has trouble typing that word lol
So, I'd like to add some comments/questions to this discussion since I finally feel like I'm not alone in the world.
Starting several months ago (mid December, I believe?) we started experiencing issues with our Meraki APs where clients couldn't connect to the WiFi... but they could. They would connect just fine, but their devices would fail to get DHCP leases. However, the experience for the clients is exactly as described: Everything is working fine, some clients sporadically drop connection and when attempting to reconnect the WiFi 'doesn't work'. We figured out early on that there were two things that could resolve this issue: Rebooting the AP that they were connecting to, AND/OR under "Client IP assignment" changing between 'Layer 3 Roaming' (what we've used since setting up Meraki years ago) and 'Bridge Mode'; Doesn't where it was set, changing the setting to the other, or changing it and changing it back would allow devices to reconnect. And I do mean 'AND/OR'; sometimes just one would work, sometimes the other, sometimes we would have to do both. I haven't tested them individually in awhile though, I just do both by default now.
We have multiple Networks with APs, but only one is effected, and all the SSIDs of that Network seem to be effected. Because of the structure of our organization, most of the APs in our networks are all on the same management subnet, and assign addresses to the same DHCP ranges for our 'private' networks and our 'public' networks, but again only one 'network' is seeing these issues.
I opened a support ticket with Meraki a few months back but they refused to move past the Event Log having entries of 'Multiple DHCP Servers Detected', but those 'errors' existed long before we had these issues and also all have the identical MAC address and are just pointing from the gateway to the IP Helper address, so I don't believe there are any issues there.
Sorry if this is hijacking this thread, we've just been dealing with this for months and I finally found this thread after checking out the network health page and searching for one of the auth errors it gives. We're also using WPA2, and some SSIDs use Radius and some use PSK, so nothing solid there either. Also, the 'minimum bandwidth' on our SSIDs is all 11 or lower, and the majority of the devices having issues are brand new Surface Go tablets or Samsung S7/iPhone 8 (or newer) phones. We also had made no changes to our network structure, SSIDs, DHCP, etc around this time, so there was no trigger to the start that we were aware of... it just started happening one week.
Has anyone found a solution to this? This sounds exactly the same as the issues we are experiencing for months now, mainly with MacBooks but also some PCs. Clients boot up their macbooks and connect to our APs just fine then randomly thought the day some users will lose their wifi connection.
Also if user is connected to wifi, with macbooks, then closes their lids and moves to another area serviced by another AP this will also happen. Their macbooks looks to be connected for a minute or so, and then they get disconnected and prompted for credentials, even though their credentials are saved. Below is what shows up in the event logs. I've noticed this only happens to certain APs that are configured identical to the rest.
Strange thing is, when clients don't close the lid and roam normally, they DON'T have this issue. They just roam seamlessly between APs with maybe 2 or 3 ping drops.
We currently have a mix of MR53s and 42s with the latest firmware.
We using RF profiles:
I've also found this thread https://community.meraki.com/t5/Wireless-LAN/802-11-disassociation-client-not-authenticated/td-p/552... with much of the same reported issues. This issue was resolved by disabling all HP printers with direct wifi enabled. Is there any merit in this? I don't want to go down another rabbit hole.
Just to provide some follow-up to my issue, somehow SOME of our APs were on a beta firmware.... which apparently had 'known issues with disconnects'.... somehow it took three tickets with Meraki support before anyone mentioned that.
Reverted back to stable/released firmware and no longer had the issue, magically. I forget where it was exactly, but it was a simple checkbox within the network settings I believe, but I forget how it was only applying itself to some of the APs within that network.... and not the whole network, or multiple networks. We also weren't experiencing issues specifically with Macs, although we don't run Macs so they might have also had the same issues.
You mentioned 'custom target power to avoid signal overlap'. With Meraki APs is signal overlap a bad thing? I thought overlap was good (to some extent) so that APs could hand off clients smoothly?
So unfortunately the firmware isn't the issue we are having. So in our environment it is a high density deployment so there was 4-5 signals overlapping, which was why i changed to custom power.