For several months now, our customer has reported poor Wi-Fi network performance from their Meraki Wi-Fi network.
One of our immediate observations is that Wi-Fi clients on the network - using Meraki MR46 APs running firmware MR 29.4.1 - are frequently being disconnected from the Wi-Fi network (several SSIDs) through 802.11 Disassociation; and in each case this is being caused by one several different factors, the most significant of which is reported in the Event Log for Access Points with the Details: “excess frame loss”.
Many thanks,
Ian
What sort of signal strength are the clients getting? You may just need to add more access points.
Are you trying to use 5Ghz only (have disabled 2.4Ghz if possible)? 2.4Ghz experiences so much interference these days.
Hi Philip,
Thank you for your response.
For background, I have 20 years' experience as a leading Wi-Fi network consultant and engineer, so please feel free to ask questions as deeply as you like 🙂
i). The Clients were Associated (connected) to the 5GHz AP radios.
ii). The clients are getting signal strengths of around -65dBm to -70dBm or better, which is more than adequate for a reliable 802.11 Association.
iii). Poor signal strength (dBm) or poor Signal-to-Noise Ratio (SNR)(dB) at the Client would tend to:
Increase frame loss at the Client (causing Downlink frame loss: AP-to-client frames) until the Adaptive Rate Shifting (ARS) algorithm at the AP reduced the AP's transmit bit rate (Modulation and Coding Scheme - MCS).
So a key question in this respect is:
Do the Meraki logged messages "excess frame loss" refer to:
a. Frame loss at the Client - recognised by missing Uplink (Client-to-AP) ACK frames at the AP? .. leading to Downlink frame re-transmissions by the AP, and a counter or rate for those re-transmissions that would be recorded by the AP? or
b. Frame loss at the AP - recognised by CRC errors in Uplink frames received from the Client at the AP?
iv). The SNR reported at the Meraki APs is mostly in the region of 39 to 41 dB - which is again more than adequate for a reliable 802.11 Association.
Poor SNR or poor signal strength at the APs (Uplink SNR (dB) or Uplink signal strength (dBm)) would
would tend to:
Increase frame loss at the AP (client-to-AP, that is Uplink frame loss) until the Adaptive Rate Shifting (ARS) algorithm at the Client reduced the Client's transmit bit rate (Modulation and Coding Scheme - MCS).
-oOo-
Can someone answer my original questions please?
My fourth (new) questions is:
4. Do the Meraki logged messages "excess frame loss" refer to:
a. Frame loss at the Client - recognised by missing Uplink (Client-to-AP) ACK frames at the AP? .. leading to Downlink frame re-transmissions by the AP, and a counter or rate for those re-transmissions that would be recorded by the AP? or
b. Frame loss at the AP - recognised by CRC errors in Uplink frames received from the Client at the AP?
Many thanks,
Ian
I have the same problem, with the same Meraki MR46 and the APs running the MR 29.4.1 firmware, I hope for some help or guidance.
Upgraded to 29.4.1 mid-December and experiencing the issue described in a high-density (50 AP) MR56 network; however, In addition to the forceful client disassociations (excess frame loss), clients seem to be aggressively roaming for no apparent reason. Quite frustrating.
I really glad to see somebody else is having the same problem. When we logged the exact same thing with meraki support we were told "well nobody else is having that problem". Not really much help.
We ended up going back to an older version. We still see excess frame loss errors but the aggressive roaming has stopped. Funny thing is that we use 29.4.1 in some other sites and dont see the roaming agressiveness.
The issue is definitely compounded at locations with high-density coverage.
@KrisM What version did you roll back to? We were previously on 28.7.1 so there were several releases in between.
I'm on MR 29.4.1 and having the same issues. Disassociation with "excess frame loss" and I can see clients roaming when they are stationary desktop PCs.
Updating Intel Wifi drivers made the "excess frame loss" problem disappear for the few clients where we were having problems. Ours is a small, low density network.
But I still see client roaming issues. We're not having any negative impact (at least none I've seen or had users report) so will probably just wait and see what happens with next firmware release.
Also experiencing forced disassociation due to "excess frame loss".
Using 2 MR30Hs on firmware ver. 29.4.1 as well. Have been running these for about a year and have not had these kinds of problems until recently (and although the total number of clients in my building fluctuates some from week to week, it doesn't generally fluctuate by more than 10% up or down).
It's beginning to look (to me anyway) like this might be an issue with the latest release. I look forward to Cisco's response here.
Similar issue, MR44 29.4.1.
Reports are coming in now after the holidays.
Lot of "802.11 disasssociation, excessive frame loss"
Experiencing the same MR46 MR56 29.4.1
Same issues for us after upgrade to 29.4.1. 5GHz only, eight MR46s, clients disconnect with problems to reconnect. Lots of "802.11 disassociation excess frame loss" in the event log. Both iOS and Windows 10 clients.
As further evidence - the building is empty right now and I have some desktops and thermostats (devices that remain there 100% of the time) that have been dropping and re-associating all night long due to excess frame loss. Total # of clients on WiFi right now is <30.
I am also seeing something similar after upgrading to 29.4.1. Anyone know if an upgrade to 29.5 is worth a shot?
fwiw - I tried this last night on a couple of MR42's and it didn't help
Thanks for sharing. We have two SSIDs with iOS devices on one and Windows devices on the other. Getting excessive frame loss on both networks. Both networks are configured with 802.11w (enabled not required). Can anyone confirm if the setting of 802.11w to enabled affects the issue of frame loss?
The intermittent disconnects affecting clients on our networks was probably caused by a bug in 29.4.1 (link to another thread below) and the workaround was deactivating 802.11w until there is a fix available. We still see great numbers of 'excess frame loss' disassociations, especially on our SSID for phones (lots of roaming going on). According to Meraki support the specific description 'excess frame loss' is pretty new and would be expected in dense/complex radio environments and, in some cases, when clients roam.
If possible, try to disable 802.11w and/or the new client balancing.
29.4.1 Radius Issues some clients unable to connect to MR-53 AP's - The Meraki Community
I've got 802.11w disabled but the disconnects are happening.
I'll take a look at the client balancing thanks for the link.
Hi everyone,
Thank you for joining in and sharing your experiences too. Hopefully Meraki will sort the software quickly.
Maybe Meraki will even answer my questions please - in my Original Post and Post #2 ?! 😉
For further information, ours is a 10x MR46 AP deployment running firmware MR 29.4.1, and is also high-density in nature: some 200x Wi-Fi users and a floor area of around 550 m sq.
Ian
We are also seeing issues with this on MR44, MR46 in various deployments and running 29.4.1
Same - we have issues, 9 x MR46's all running 29.4.1.
We just started getting reports of devices disconnecting at multiple sites. When I checked the logs I also found the `excess frame loss` error. We updated to MR 29.4.1 back in mid-late Nov (mr53 APs). I called Meraki support this morning and they wanted me to run a monitor mode packet capture and send it to them (which I am in the process of doing at the moment) but this does seem suspiciously like a bug with the new firmware.
I checked some other sites running 28.6 and seeing excess frame loss for some clients that are disassociating. I did a little more digging and it seems to mostly affect 802.11ax and 802.11ac clients and havent seen it affecting 802.11n clients
You may want to check if Client Balancing is enabled as well, looks like the algorithm has changed for that on FW 29.x from a passive event (only during association time) to an active event (during and post association):
https://documentation.meraki.com/MR/Other_Topics/Client_Balancing
We have disabled client balancing but the issue remains. Only ac/ax clients in the network.
We have the same user experience on iPhones since migrating from MR53's to MR56's. Upgraded the firmware per another recommendation from ver. 18 to 19, but the issue persists. I noticed the disconnect happens when roaming around the building. Then it never reconnects. A manual reconnection works with no issues, and auto-join is enabled on the device side. Very strange issue. Meraki support case opened to see what they say.
I just performed a new deployment so the ap's came up on 29.4.1 initially. I'm seeing the same excess frame loss errors and clients dropping left and right. I submitted a case and this was the response:
Hi Team, This looks like it's a known issue that we are working on. I will update you as soon as I have more info. in the meantime are you able to down grade the firmware of the APs? Please let me know if you have any questions or need anything clarified. If you require immediate assistance, you can also call into our 24/7 support line to continue troubleshooting. See Help > Get help for a list of valid numbers to call us.
Seems the "enhancements" are doing more harm than good.
Did they suggest what firmware to downgrade to? We seem to be encountering the same issues but have been on MR28.5 for quite a long time. Lots high density locations as well..
No they didnt. They just rolled me back to my previous firmware which was 28.7.
We are going to try and problematic clients to 802.11n in settings of their devices for now.
We've gotten one response from Meraki Support regarding the 'excess frame loss' message. They said that when a client that is connected to AP-1(using this as an example) roams over to AP-2, AP-1 still thinks it's connected to client, but it never receives a response from the client. This is when AP-1 disassociates the client due to 'excess frame loss'.
They told me the same thing. I have a monitor mode packet cap they asked for and I found one of the devices that was disassociated. It was deauth'd right after a flood of frames hit it. The device's antenna signal maintained a -28dBm to -30dBm signal strength the entire time so I'm not convinced this is entirely about roaming. Either way, devices are being disconnected in our network and some are having a hard time getting back on the network requiring extensive help from our help desk team to the users affected. Whatever Meraki's explanation is, it shouldn't be happening.
To add on to this: We are seeing an issue where clients are being disassociated by the AP but the client itself still thinks it's connected to the AP. Hoping that Meraki Support can give us the specific parameters that trigger the 'excess frame loss' message like OP had asked.
Same situation in two different locations, one of them with high density and the other with just one AP (with no possibility of roaming). In both cases, the 'excess frame loss' appears but we didn't get a clear answer from support after a few calls with them.
I am also seeing similar issues however I am running MR 29.5. I upgraded because of issues with layer 3 roaming. Support suggested that we go to 29.4.1 but around the time we opened the ticket they told us to skip that version and go directly to 29.5. This was because the same damn layer 3 roaming issue persisted on 29.4.1. I checked my logs and was seeing these disassociations during the time I was busy trying to fix layer 3 roaming. We got sick of the layer 3 roaming issues and moved to one broadcast domain.
Now that we are layer 2 roaming it has been apparent that there are other issues with roaming and that led me to check the logs going back 2 months when we moved to this version. We upgraded December 1, 2022. So I now see they have 29.5.1 available but no mention of this issue being resolved as well as no mention that the layer 3 roaming anchor "stickiness" being fixed. I am at a loss and getting really pissed.
Also experiencing this issue and have been told what other users already reported:
When a client that is connected to AP-1(using this as an example) roams over to AP-2, AP-1 still thinks it's connected to client, but it never receives a response from the client. This is when AP-1 disassociates the client due to 'excess frame loss'.
One engineer spent HOURS working on this with me, collecting tons of data points but eventually said there wasn't much that he could do and asked me to do further testing on my own...
Meraki support - any help would be highly appreciated as this ongoing issue is destroying my staff's trust in your wireless solution.
Support told me they are aware of a roaming issue where a device will get hit with a flood of block ack requests and then get deauth’d (confirmed in our pcap). Will be addressed in a firmware update (eta tbd).
On another note, I also noticed a bunch of clients being disconnected due to client load balancing being unnecessarily aggressive. I turned it off on all our RF profiles and that seemed to stop the incoming tickets for wireless disconnect problems for us. Might be worth investigating for some of the others out there having trouble with client disconnects.
Hi All,
Same problem here.
Running MR 27.7.1 and MR33/MR76 APs with Android clients on 5GHz.
See those messages a lot, not sure when btw.
Client balancing is off.
802.11w is off
802.11r is off
Min bitrate 6
Cheers!
Hi All - We had the exactly same issues and has been resolved now. See below:-
- Issues were for wireless clients having NIC card/adaptor as "Intel(R) Wi-Fi 6 AX201" as those are compatible with 802.11ax.
- I am anticipating that there may be issues with AX411, AX211, AX210, AX203, AX200, AX101, 9560, 9462, 9461, 9260" as well but I am not 100% sure. So, you guys can test and let me know how it goes.
Resolution: Update the drivers to "22.190.0.4".
I would recommend testing with few clients with different NIC card/adaptor from below and see if that works. If yes, then we know what to do 🙂
List:-
The 22.190.0 package installs the Windows® 10 and Windows 11* Wi-Fi drivers for the following Intel® Wireless Adapters:
Windows® 10 64-bit and Windows 11*
22.190.0.4 for AX411, AX211, AX210, AX203, AX201, AX200, AX101, 9560, 9462, 9461, 9260
20.70.32.1 for 8265, 8260
19.51.42.2 for 7265(Rev. D), 3168, 3165
What about the Android clients?
I'm glad you were able to resolve the issue on your Windows devices however, I see this issue across all my client devices and it seems to be mostly affecting 802.11ax, 802.11ac as reported by @DBlum
I'm going to try this Intel driver upgrade to see if it helps. This is currently only showing on Windows clients for us.
@Hwarraich , Can you please share the document/link that for AX201 its fixed in 122.190.0.4 and 20.70.32.1 for AC8265?
Because I see microsoft mentioning its fix in 22.220.0.4
Same problem here.
Running MR 29.4.1 and MR42/MR53 APs with ios clients on 5GHz.
See "802.11 disassociation excess frame loss" messages in the AP event log correlates with clients getting kicked off.
Client balancing is off.
Just opened a ticket.
I think I experienced this issue first hand:
This happened as I came back to the office today from working from home since 1/25/23. My laptop was connected to an AP and had an IP address but could not ping the default gateway or traceroute out. I tried to turn off my wifi adapter but could not. I was able to put my adapter in sniffer mode as per the monitor mode packet capture instructions. After I ended the capture my laptop seemed to have even more issues. At one point all the adapters listed in System Preferences>Network disappeared and I had to reboot to recover. I was able to recover the .pcap and forwarded it to Meraki.
I am running macOS 12.6.2.
Has there been any update? We are still seeing issues and just put a new bldg up and of course half the clients are not working properly. Another weird issue we are experiencing with layer3 is that these devices are also getting apipa addresses and dhcp (going through switch) is not assigning proper ip's some oft the time.
We've talked to two different techs about the issue and while they acknowledge they have gotten an influx in calls about the issue since Christmas, I haven't heard an official position from Meraki, only "it's been raised to engineering to review". We have rolled back both of our sites to 28.6 as having 50% or more of our devices bouncing regularly is not an option.
We've been hesitant to roll back (+150 networks). Did the rollback to 28.6 restore stability for yours?
Not ideal, however, some improvement with Windows clients has been seen by disabling Meraki-side client balancing, reducing client-side roaming aggressiveness, and updating the NIC drivers.
Where is the roaming option you speak of?
Client Balancing can be found in the radio settings/profile within the Meraki console;
Wireless/Radio Settings/Profile/Client balancing(ON/OFF)
Windows client-side roaming aggressiveness can be found in the advanced settings of the wireless NIC;
Adapter properties/Configure/Advanced/Roaming Aggressiveness(Lowest<->Highest)
Thank you.
We're going to try rolling back to 28.6 today.
Good luck!
It did fix all stability issues for us. We were seeing the issue more in our higher concentrated AP location first, but it eventually did show up in our lower concentrated AP location. We were unable to roll back ourselves as it was past the 14-day threshold to do so. But support was easy to work with and scheduled it for the time we chose. Only caveat is they will only schedule on the hour. Rollback of 35 AP's took about 15 minutes.
were you running 29.4.1 and rolled back to 28.6? was this the advice of Meraki techs? I was on 29.5 and seemed to be getting the same negative affects but they suggested I go to 29.5.1. I finally upgraded over the weekend and I am monitoring results. i do not think just seeing the excess frame loss message indicates an issue. there seems to be another factor that I have yet to find. to me the new client load balancing is more suspicious given the fact that is is on by default and there seems to be no way to disable that. today is usually a crowded day so the firmware should get tested.
We are on 29.5.1 atm. I am trying what I read in this thread to see if it helps by rolling back. Meraki didn't suggest or push back.
One tech recommended this, the other didn't. I saw others try to go up in firmware with the same issue still happening. I have not tried 29.5.1, but have read the release notes and it doesn't mention fixing this issue specifically, just "general stability and performance improvements". Most say that so I'm not going to roll forward unitl I see more from Meraki. We have a lot of iPads on the network for maintenance staff and I can't adjust the NIC settings on those, and they are mission critical.
We first noticed the issue on IOT devices monitoring temp/humidity throughout the plant as well as iPads/tablets getting disconnected randomly. Devices would show disconnected to load balancing in the error logs and the AP's would show the excess frame loss error. It was happening in our plant with a high number of AP's, also mentioned earlier in this thread. It eventually showed up in our smaller plant so AP concentration may/may not be another factor.
I'm seeing this on an MX-67W as well. Mostly phones on the network and it's only the guest network with issues. Seeing the excess frame loss errors.
MX-67W running MX 17.10.2
Seeing very similar issues and are currently running 29.4.1. We have one site on 28.7 still with no issues. The tech recommended downgrading to that version since it is known to be working. I will give an update tomorrow if it helped us or not.
Last night I downgraded a couple of schools to 28.7 + disabled Client balancing.
High density networks, more than 100 MR56 each.
So far no complains coming from the users and the logs are clean.
Will keep and eye on them and update everyone.
having same issues on MR56 running 29.4.1
From what I can gather, when clients are roaming and reauthentication to the RADIUS server, and this is taking long time and if on a Teams call, the call drops
We are having the same issue running 29.4.1 on MR56.
As above mainly noticed by customers whilst on Teams calls that drop and take a long time to reconnect. This seems to correspond with a roam and re-authentication.
We have seen improvements on customer experiences when changing the SSID from layer 3 to Bridge mode and enabling 802.1r. The excess frame loss messages are still seen in the logs but the customer experience seems to be improved.
Given the number of people posting here I can't imagine that everyone is in layer 3 mode, so this might be coincidental, but it feels like it may have been beneficial for us so far. But it is still only a week or so since we started moving sites to this setup.
All,
We rolled back on 2/7/23 to 28.6 and seem to have no complaints from the site. I want to reiterate that this option may not be for everyone as you're going back a major revision and sacrificing newer features. Please weigh these options before deciding to rollback.
We have noticed a few things.
1. Upgrading to 29.x introduces a lot of aggressive roaming due to the load balancing. 802.11v.
2. If you look a lot of the frame loss is actuly aginst the "old AP" the client roams.from.AP 1 to AP 2. And then a short while later AP 1 reports frame loss and disconnect the client. So it is were the client has not gratefully disconnected from.the original.AP.
This is not all of the frame loss but if you filter a single client you will see association to new AP often come before frame loss to old.
What ever is causing the increased roaming it is a nightmare. We see clients with good strength and SNR roaming while the user is sitting still interrupting calls.
And visibility in the timeouts / thresholds and visibility to trouble shoot is not good.
I completely agree with @DevilWAH
The issue was originally reported from Chromebook devices disconnecting or reporting poor client experience, eventually all devices with no exceptions were reporting the same anomalies.
I also noticed AP performance was poor ranging from 90 to 17%
I rolled back to 28.7 with the help of TAC last night and can confirm that all my client devices are back to normal this morning.
As stated by @Schwa87 this rollback may not be for everyone.
We have the same, we've started noticing it a lot due to change of our zero trust solution, when users 'lose' connectivity they see the disconnection pop up. I say lose, I've had it myself around 5 times today but we aren't seeing timeline showing disassociations in client timeline view, and the client itself doesn't show a disconnection (checked using netsh wlan show wlanreport.)
It would appear it's a loss of Layer2 and above, the radio is up but no data across it for a brief moment which is tearing things down. Here's my own laptop for past 24 hours where I've had multiple drops but as far as Meraki eventlog and timeline is concerned all is ok
We're around 99% confident rolling back to 28.X fixes on, on two smaller test networks, one with about 20 devices on firmware 29.5 on Monday had around 60 excess frame loss messages, but also a lot of "unspecified reason". Rolling back to 28.5.1 (for some reason 28.7 isn't available to us) that went to 3 for the entire day, which I guess will happen at times if users move far away from an AP with nothing to roam to
We've tried client balancing on and off, with it on we see very erratic roaming, my boss was on 4 different APs in about 30 minutes, bouncing between two of them constantly. At one point, physically his laptop had 4 APs in a line from him, and it was associating with the furthest away at around 150 foot, but there was one 20 feet from him. Around 250 clients in building with 24 APs, 16 of them in the office areas in that example
We have a mixture of models and site sizes, MR33, MR42, MR52 and MR44 (MR44 being the bulk) - 15 users, 30 users, 250 above, doesn't make a difference
We see the issue across the all sites and sizes, personally us (the internal IT) have the issue, all Windows devices, to be specific Dell Latitude 7300/10/20/30 and 5420. But we see it for MacOS devices, Intel and Apple models, iOS, Android and some static devices such as HDMI sticks running display boards
Planning to revert back Monday evening globally. Will update on what we experience in the few days afterwards, it will be pretty apparent to us if it fixes the loss of connectivity. Shame, the client balancing change in 29.X sounds nifty
I'd love to know how you rollback goes. How larger of a global deployment do you have? Rough number of networks and APs?
Unfortunately, we dont have the option to roll back other than one or maybe two very small test sites. One of those would only have 1 client at best the other maybe 4-5 clients but only 1 access point. Any of you that have worked with Meraki TAC do you feel like they are addressing the issue? Have you escalated to any Meraki SE contact?
Aye no problem Colt, I'll put in here over the course of next week how it goes. We have 91 MR devices in total, 28 remote sites with 1 AP each, and 9x networks with multiple APs, ranging from 4 APs, up to the largest I'm in at 24
Spotted something esle, if it helps anyone, on phone so cant get screenshots.
My laptops has dropped 7 times today, but according to first picture of client timeline, been associated since 2am. However, spotted a colleagues machine had "multiple dhcp server" messages, sure enough checked and last time I saw that error, their ZTNA shows that being the connected time. Look at my client, and I have that client event of multiple dhcp servers detected 7 times today. So as far as I can tell, excess frame error, or multiple dhcp servers detected in client event log is a client dropping. At least its another symptom to check against after the rollback to verify the effects
Meraki support downgraded 3x networks for us to 28.7 yesterday at around midday, since then I haven't seen a single excess frame loss, multiple dhcp detection or unspecified reason for 802.11 disassocation event logs; but these networks are devoid of human life over the weekend
But I was seeing them before that since Friday evening up to the downgrade yesterday at noon, sporadically, for clients that had been left onsite or parts of the instrastructure on wireless connectivity. See how today goes when it's business as usual, if looking good I've got the nod internally to pull the trigger on everything, just needs to go through officially in our change board late this afternoon
We migrated all clients to 28.7.0 on Monday evening, so far it seems to be a major improvement in stability
I've just checked 20+ IT devices in client view as a test base, and can see in client timeline they are coming online start of day, and not showing any reconnection events
We've had some drop outs for ZTNA service but I believe that is WAN related, a few devices that remain in office have been connected for 72 hours solid now
To confirm:
Monitoring for a week and will report back how we get on