Wi-Fi Clients being Disassociated due to "excess frame loss"

OilG
Conversationalist

Wi-Fi Clients being Disassociated due to "excess frame loss"

 

For several months now, our customer has reported poor Wi-Fi network performance from their Meraki Wi-Fi network.

 

One of our immediate observations is that Wi-Fi clients on the network - using Meraki MR46 APs running firmware MR 29.4.1 - are frequently being disconnected from the Wi-Fi network (several SSIDs) through 802.11 Disassociation; and in each case this is being caused by one several different factors, the most significant of which is reported in the Event Log for Access Points with the Details: “excess frame loss”.

 

  1. What percentage or rate of 802.11 frame loss is sufficient to cause the Meraki access point to disassociate a client?

  2. How can we adjust that percentage threshold, or frame loss rate threshold, or the timespan over which that percentage/rate is assessed - so that clients will then not be disassociated as often as they are currently being?

  3. How can I see the actual percentages of frame loss / frame error rates:

    1. For each client?
    2. For each AP?
    3. For the customer's network as a whole?

 

Many thanks,

Ian

141 Replies 141
PhilipDAth
Kind of a big deal
Kind of a big deal

What sort of signal strength are the clients getting?  You may just need to add more access points.

 

Are you trying to use 5Ghz only (have disabled 2.4Ghz if possible)?  2.4Ghz experiences so much interference these days.

OilG
Conversationalist

Hi Philip,

 

Thank you for your response.

 

For background, I have 20 years' experience as a leading Wi-Fi network consultant and engineer, so please feel free to ask questions as deeply as you like 🙂

 

i). The Clients were Associated (connected) to the 5GHz AP radios.

 

ii). The clients are getting signal strengths of around -65dBm to -70dBm or better, which is more than adequate for a reliable 802.11 Association.

 

iii). Poor signal strength (dBm) or poor Signal-to-Noise Ratio (SNR)(dB) at the Client would tend to:

 

Increase frame loss at the Client (causing Downlink frame loss: AP-to-client frames) until the Adaptive Rate Shifting (ARS) algorithm at the AP reduced the AP's transmit bit rate (Modulation and Coding Scheme - MCS).

 

So a key question in this respect is:

 

Do the Meraki logged messages "excess frame loss" refer to:

 

a. Frame loss at the Client - recognised by missing Uplink (Client-to-AP) ACK frames at the AP? .. leading to Downlink frame re-transmissions by the AP, and a counter or rate for those re-transmissions that would be recorded by the AP? or

 

b. Frame loss at the AP - recognised by CRC errors in Uplink frames received from the Client at the AP?  

 

 

iv). The SNR reported at the Meraki APs is mostly in the region of 39 to 41 dB - which is again more than adequate for a reliable 802.11 Association.

 

Poor SNR or poor signal strength at the APs (Uplink SNR (dB) or Uplink signal strength (dBm)) would 

would tend to:

 

Increase frame loss at the AP (client-to-AP, that is Uplink frame loss) until the Adaptive Rate Shifting (ARS) algorithm at the Client reduced the Client's transmit bit rate (Modulation and Coding Scheme - MCS).

 

-oOo-

 

Can someone answer my original questions please?

 

  1. What percentage or rate of 802.11 frame loss is sufficient to cause the Meraki access point to disassociate a client?

  2. How can we adjust that percentage threshold, or frame loss rate threshold, or the timespan over which that percentage/rate is assessed - so that clients will then not be disassociated as often as they are currently being?

  3. How can I see the actual percentages of frame loss / frame error rates:

    1. For each client?
    2. For each AP?
    3. For the customer's network as a whole?

 

My fourth (new) questions is:

 

4. Do the Meraki logged messages "excess frame loss" refer to:

 

a. Frame loss at the Client - recognised by missing Uplink (Client-to-AP) ACK frames at the AP? .. leading to Downlink frame re-transmissions by the AP, and a counter or rate for those re-transmissions that would be recorded by the AP? or

 

b. Frame loss at the AP - recognised by CRC errors in Uplink frames received from the Client at the AP?  

 

Many thanks,

Ian

mgonzalez
Conversationalist

I have the same problem, with the same Meraki MR46 and the APs running the MR 29.4.1 firmware, I hope for some help or guidance. 

dcvr
Conversationalist

Upgraded to 29.4.1 mid-December and experiencing the issue described in a high-density (50 AP) MR56 network; however, In addition to the forceful client disassociations (excess frame loss), clients seem to be aggressively roaming for no apparent reason. Quite frustrating.

KrisM
Conversationalist

I really glad to see somebody else is having the same problem. When we logged the exact same thing with meraki support we were told "well nobody else is having that problem". Not really much help.

 

We ended up going back to an older version. We still see excess frame loss errors but the aggressive roaming has stopped. Funny thing is that we use 29.4.1 in some other sites and dont see the roaming agressiveness.

dcvr
Conversationalist

The issue is definitely compounded at locations with high-density coverage.

 

@KrisM What version did you roll back to?  We were previously on 28.7.1 so there were several releases in between.

HSG_Network
Conversationalist

I'm on MR 29.4.1 and having the same issues.  Disassociation with "excess frame loss" and I can see clients roaming when they are stationary desktop PCs.  

HSG_Network
Conversationalist

Updating Intel Wifi drivers made the "excess frame loss" problem disappear for the few clients where we were having problems.   Ours is a small, low density network.  

 

But I still see client roaming issues.  We're not having any negative impact (at least none I've seen or had users report) so will probably just wait and see what happens with next firmware release.

BP1
Conversationalist

Also experiencing forced disassociation due to "excess frame loss".

Using 2 MR30Hs on firmware ver. 29.4.1 as well. Have been running these for about a year and have not had these kinds of problems until recently (and although the total number of clients in my building fluctuates some from week to week, it doesn't generally fluctuate by more than 10% up or down).

It's beginning to look (to me anyway) like this might be an issue with the latest release. I look forward to Cisco's response here.

HansW
Conversationalist

Similar issue, MR44 29.4.1.
Reports are coming in now after the holidays.
Lot of "802.11 disasssociation, excessive frame loss"

WTorres
Getting noticed

Experiencing the same MR46 MR56 29.4.1

Henrik_
Here to help

Same issues for us after upgrade to 29.4.1. 5GHz only, eight MR46s, clients disconnect with problems to reconnect. Lots of "802.11 disassociation excess frame loss" in the event log. Both iOS and Windows 10 clients. 

BP1
Conversationalist

As further evidence - the building is empty right now and I have some desktops and thermostats (devices that remain there 100% of the time) that have been dropping and re-associating all night long due to excess frame loss. Total # of clients on WiFi right now is <30.

Rycherd
Comes here often

I am also seeing something similar after upgrading to 29.4.1. Anyone know if an upgrade to 29.5 is worth a shot?

andrew114
New here

fwiw - I tried this last night on a couple of MR42's and it didn't help

Henrik_
Here to help

Thanks for sharing. We have two SSIDs with iOS devices on one and Windows devices on the other. Getting excessive frame loss on both networks. Both networks are configured with 802.11w (enabled not required). Can anyone confirm if the setting of 802.11w to enabled affects the issue of frame loss?

Henrik_
Here to help

The intermittent disconnects affecting clients on our networks was probably caused by a bug in 29.4.1 (link to another thread below) and the workaround was deactivating 802.11w until there is a fix available. We still see great numbers of 'excess frame loss' disassociations, especially on our SSID for phones (lots of roaming going on). According to Meraki support the specific description 'excess frame loss' is pretty new and would be expected in dense/complex radio environments and, in some cases, when clients roam.

 

If possible, try to disable 802.11w and/or the new client balancing.

 

29.4.1 Radius Issues some clients unable to connect to MR-53 AP's - The Meraki Community

HSG_Network
Conversationalist

I've got 802.11w disabled but the disconnects are happening. 

 

I'll take a look at the client balancing thanks for the link.

OilG
Conversationalist

Hi everyone,

 

Thank you for joining in and sharing your experiences too. Hopefully Meraki will sort the software quickly.

 

Maybe Meraki will even answer my questions please - in my Original Post and Post #2  ?!  😉 

 

For further information, ours is a 10x MR46 AP deployment running firmware MR 29.4.1, and is also high-density in nature: some 200x Wi-Fi users and a floor area of around 550 m sq.

 

Ian

DBlum
Getting noticed

We are also seeing issues with this on MR44, MR46 in various deployments and running 29.4.1

TimJ
Comes here often

Same - we have issues, 9 x MR46's all running 29.4.1.  

dhatcher
Here to help

We just started getting reports of devices disconnecting at multiple sites. When I checked the logs I also found the `excess frame loss` error. We updated to MR 29.4.1 back in mid-late Nov (mr53 APs). I called Meraki support this morning and they wanted me to run a monitor mode packet capture and send it to them (which I am in the process of doing at the moment) but this does seem suspiciously like a bug with the new firmware. 

DBlum
Getting noticed

I checked some other sites running 28.6 and seeing excess frame loss for some clients that are disassociating.  I did a little more digging and it seems to mostly affect 802.11ax and 802.11ac clients and havent seen it affecting 802.11n clients

JWidoWiFi
Comes here often

You may want to check if Client Balancing is enabled as well, looks like the algorithm has changed for that on FW 29.x from a passive event (only during association time) to an active event (during and post association):

https://documentation.meraki.com/MR/Other_Topics/Client_Balancing 

Henrik_
Here to help

We have disabled client balancing but the issue remains. Only ac/ax clients in the network.

Netwerkz
Conversationalist

We have the same user experience on iPhones since migrating from MR53's to MR56's. Upgraded the firmware per another recommendation from ver. 18 to 19, but the issue persists. I noticed the disconnect happens when roaming around the building. Then it never reconnects. A manual reconnection works with no issues, and auto-join is enabled on the device side. Very strange issue. Meraki support case opened to see what they say. 

AtomVega
Comes here often

I just performed a new deployment so the ap's came up on 29.4.1 initially. I'm seeing the same excess frame loss errors and clients dropping left and right. I submitted a case and this was the response:

 

Hi Team, This looks like it's a known issue that we are working on. I will update you as soon as I have more info. in the meantime are you able to down grade the firmware of the APs? Please let me know if you have any questions or need anything clarified. If you require immediate assistance, you can also call into our 24/7 support line to continue troubleshooting. See Help > Get help for a list of valid numbers to call us.

 

Seems the "enhancements" are doing more harm than good. 

Kolt601
Conversationalist

Did they suggest what firmware to downgrade to?  We seem to be encountering the same issues but have been on MR28.5 for quite a long time. Lots high density locations as well..

AtomVega
Comes here often

No they didnt. They just rolled me back to my previous firmware which was 28.7.

DBlum
Getting noticed

We are going to try and problematic clients to 802.11n in settings of their devices for now.

ForwardObserver
New here

We've gotten one response from Meraki Support regarding the 'excess frame loss' message. They said that when a client that is connected to AP-1(using this as an example) roams over to AP-2, AP-1 still thinks it's connected to client, but it never receives a response from the client. This is when AP-1 disassociates the client due to 'excess frame loss'.

dhatcher
Here to help

They told me the same thing. I have a monitor mode packet cap they asked for and I found one of the devices that was disassociated. It was deauth'd right after a flood of frames hit it. The device's antenna signal maintained a -28dBm to -30dBm signal strength the entire time so I'm not convinced this is entirely about roaming. Either way, devices are being disconnected in our network and some are having a hard time getting back on the network requiring extensive help from our help desk team to the users affected. Whatever Meraki's explanation is, it shouldn't be happening. 

ForwardObserver
New here

To add on to this: We are seeing an issue where clients are being disassociated by the AP but the client itself still thinks it's connected to the AP. Hoping that Meraki Support can give us the specific parameters that trigger the 'excess frame loss' message like OP had asked.

josean24
Conversationalist

Same situation in two different locations, one of them with high density and the other with just one AP (with no possibility of roaming). In both cases, the 'excess frame loss' appears but we didn't get a clear answer from support after a few calls with them. 

 

 

 

 

 

aabraham
Here to help

I am also seeing similar issues however I am running MR 29.5. I upgraded because of issues with layer 3 roaming. Support suggested that we go to 29.4.1 but around the time we opened the ticket they told us to skip that version and go directly to 29.5. This was because the same damn layer 3 roaming issue persisted on 29.4.1. I checked my logs and was seeing these disassociations during the time I was busy trying to fix layer 3 roaming. We got sick of the layer 3 roaming issues and moved to one broadcast domain.

Now that we are layer 2 roaming it has been apparent that there are other issues with roaming and that led me to check the logs going back 2 months when we moved to this version. We upgraded December 1, 2022. So I now see they have 29.5.1 available but no mention of this issue being resolved as well as no mention that the layer 3 roaming anchor "stickiness" being fixed. I am at a loss and getting really pissed.

MiguelMVLA
Here to help

Also experiencing this issue and have been told what other users already reported:

 

When a client that is connected to AP-1(using this as an example) roams over to AP-2, AP-1 still thinks it's connected to client, but it never receives a response from the client. This is when AP-1 disassociates the client due to 'excess frame loss'.

 

One engineer spent HOURS working on this with me, collecting tons of data points but eventually said there wasn't much that he could do and asked me to do further testing on my own...

 

Meraki support -  any help would be highly appreciated as this ongoing issue is destroying my staff's trust in your wireless solution.

Miguel
dhatcher
Here to help

Support told me they are aware of a roaming issue where a device will get hit with a flood of block ack requests and then get deauth’d (confirmed in our pcap). Will be addressed in a firmware update (eta tbd). 

 

On another note, I also noticed a bunch of clients being disconnected due to client load balancing being unnecessarily aggressive. I turned it off on all our RF profiles and that seemed to stop the incoming tickets for wireless disconnect problems for us. Might be worth investigating for some of the others out there having trouble with client disconnects. 

Erik_WLAN
Comes here often

Hi All,

 

Same problem here.

Running MR 27.7.1 and MR33/MR76 APs with Android clients on 5GHz. 

See those messages a lot, not sure when btw.

Client balancing is off.

802.11w is off

802.11r is off

Min bitrate 6

 

Cheers!

Hwarraich
Conversationalist

Hi All - We had the exactly same issues and has been resolved now. See below:-

 

- Issues were for wireless clients having NIC card/adaptor as "Intel(R) Wi-Fi 6 AX201" as those are compatible with 802.11ax.

 

-  I am anticipating that there may be issues with AX411, AX211, AX210, AX203, AX200, AX101, 9560, 9462, 9461, 9260" as well but I am not 100% sure. So, you guys can test and let me know how it goes.

 

Resolution: Update the drivers to "22.190.0.4".

 

I would recommend testing with few clients with different NIC card/adaptor from below and see if that works. If yes, then we know what to do 🙂

 

List:-

 

The 22.190.0 package installs the Windows® 10 and Windows 11* Wi-Fi drivers for the following Intel® Wireless Adapters:

 

Windows® 10 64-bit and Windows 11*

 

22.190.0.4 for AX411, AX211, AX210, AX203, AX201, AX200, AX101, 9560, 9462, 9461, 9260
20.70.32.1 for 8265, 8260
19.51.42.2 for 7265(Rev. D), 3168, 3165

 

 

Erik_WLAN
Comes here often

What about the Android clients?

WTorres
Getting noticed

I'm glad you were able to resolve the issue on your Windows devices however, I see this issue across all my client devices and it seems to be mostly affecting  802.11ax, 802.11ac as reported by @DBlum 

MiguelMVLA
Here to help

Hwarraich, thank you for your suggestion. My organization is exclusively experiencing issues with Apple MacOS/iOS devices although we support Android phones, Windows laptops, and Chromebooks extensively.
 
We hoped the issue was related to a know Apple issue addressed HERE but the problem persists after the updates.
Miguel
HSG_Network
Conversationalist

I'm going to try this Intel driver upgrade to see if it helps.  This is currently only showing on Windows clients for us.

NitinVats
Here to help

@Hwarraich , Can you please share the document/link that for AX201 its fixed in 122.190.0.4 and 20.70.32.1 for AC8265?
Because I see microsoft mentioning its fix in 22.220.0.4

AndrewZirkel
Conversationalist

Same problem here.

Running MR 29.4.1 and MR42/MR53 APs with ios clients on 5GHz. 

See "802.11 disassociation excess frame loss" messages in the AP event log correlates with clients getting kicked off.

Client balancing is off.

Just opened a ticket.

aabraham
Here to help

I think I experienced this issue first hand:

Screen Shot 2023-01-31 at 2.08.28 PM.png

 

This happened as I came back to the office today from working from home since 1/25/23. My laptop was connected to an AP and had an IP address but could not ping the default gateway or traceroute out. I tried to turn off my wifi adapter but could not. I was able to put my adapter in sniffer mode as per the monitor mode packet capture instructions. After I ended the capture my laptop seemed to have even more issues. At one point all the adapters listed in System Preferences>Network disappeared and I had to reboot to recover. I was able to recover the .pcap and forwarded it to Meraki.

 

I am running macOS 12.6.2.

DBlum
Getting noticed

Has there been any update?  We are still seeing issues and just put a new bldg up and of course half the clients are not working properly.  Another weird issue we are experiencing with layer3 is that these devices are also getting apipa addresses and dhcp (going through switch) is not assigning proper ip's some oft the time.

Gary_Rowe
Here to help

We've talked to two different techs about the issue and while they acknowledge they have gotten an influx in calls about the issue since Christmas, I haven't heard an official position from Meraki, only "it's been raised to engineering to review". We have rolled back both of our sites to 28.6 as having 50% or more of our devices bouncing regularly is not an option.

dcvr
Conversationalist

We've been hesitant to roll back (+150 networks). Did the rollback to 28.6 restore stability for yours?

 

Not ideal, however, some improvement with Windows clients has been seen by disabling Meraki-side client balancing, reducing client-side roaming aggressiveness, and updating the NIC drivers.

Schwa87
New here

Where is the roaming option you speak of?

dcvr
Conversationalist

Client Balancing can be found in the radio settings/profile within the Meraki console;

 

Wireless/Radio Settings/Profile/Client balancing(ON/OFF) 

 

Windows client-side roaming aggressiveness can be found in the advanced settings of the wireless NIC;

 

Adapter properties/Configure/Advanced/Roaming Aggressiveness(Lowest<->Highest)

Schwa87
New here

Thank you.

 

We're going to try rolling back to 28.6 today.

Gary_Rowe
Here to help

Good luck!

Gary_Rowe
Here to help

It did fix all stability issues for us. We were seeing the issue more in our higher concentrated AP location first, but it eventually did show up in our lower concentrated AP location. We were unable to roll back ourselves as it was past the 14-day threshold to do so. But support was easy to work with and scheduled it for the time we chose. Only caveat is they will only schedule on the hour. Rollback of 35 AP's took about 15 minutes.

aabraham
Here to help

were you running 29.4.1 and rolled back to 28.6? was this the advice of Meraki techs? I was on 29.5 and seemed to be getting the same negative affects but they suggested I go to 29.5.1. I finally upgraded over the weekend and I am monitoring results. i do not think just seeing the excess frame loss message indicates an issue. there seems to be another factor that I have yet to find. to me the new client load balancing is more suspicious given the fact that is is on by default and there seems to be no way to disable that. today is usually a crowded day so the firmware should get tested. 

Schwa87
New here

We are on 29.5.1 atm. I am trying what I read in this thread to see if it helps by rolling back. Meraki didn't suggest or push back.

Gary_Rowe
Here to help

One tech recommended this, the other didn't. I saw others try to go up in firmware with the same issue still happening. I have not tried 29.5.1, but have read the release notes and it doesn't mention fixing this issue specifically, just "general stability and performance improvements". Most say that so I'm not going to roll forward unitl I see more from Meraki. We have a lot of iPads on the network for maintenance staff and I can't adjust the NIC settings on those, and they are mission critical. 

 

We first noticed the issue on IOT devices monitoring temp/humidity throughout the plant as well as iPads/tablets getting disconnected randomly. Devices would show disconnected to load balancing in the error logs and the AP's would show the excess frame loss error. It was happening in our plant with a high number of AP's, also mentioned earlier in this thread. It eventually showed up in our smaller plant so AP concentration may/may not be another factor.

Davek580
Conversationalist

I'm seeing this on an MX-67W as well. Mostly phones on the network and it's only the guest network with issues. Seeing the excess frame loss errors.

 

MX-67W running MX 17.10.2

Netwerkz
Conversationalist

Seeing very similar issues and are currently running 29.4.1. We have one site on 28.7 still with no issues. The tech recommended downgrading to that version since it is known to be working. I will give an update tomorrow if it helped us or not. 

WTorres
Getting noticed

Last night I downgraded a couple of schools to 28.7 + disabled Client balancing. 

High density networks, more than 100 MR56 each.

 

So far no complains coming from the users and the logs are clean.

Will keep and eye on them and update everyone.

AxL1971
Building a reputation

having same issues on MR56 running 29.4.1

 

From what I can gather, when clients are roaming and reauthentication to the RADIUS server, and this is taking long time and if on a Teams call, the call drops

AdBurl
Getting noticed

We are having the same issue running 29.4.1 on MR56. 

As above mainly noticed by customers whilst on Teams calls that drop and take a long time to reconnect.  This seems to correspond with a roam and re-authentication. 

We have seen improvements on customer experiences when changing the SSID from layer 3 to Bridge mode and enabling 802.1r.  The excess frame loss messages are still seen in the logs but the customer experience seems to be improved.  
Given the number of people posting here I can't imagine that everyone is in layer 3 mode, so this might be coincidental, but it feels like it may have been beneficial for us so far.  But it is still only a week or so since we started moving sites to this setup.

Schwa87
New here

All,

 

We rolled back on 2/7/23 to 28.6 and seem to have no complaints from the site. I want to reiterate that this option may not be for everyone as you're going back a major revision and sacrificing newer features. Please weigh these options before deciding to rollback.

DevilWAH
Here to help

We have noticed a few things. 

 

1. Upgrading to 29.x introduces a lot of aggressive roaming due to the load balancing. 802.11v. 

 

2. If you look a lot of the frame loss is actuly aginst the "old AP" the client roams.from.AP 1 to AP 2. And then a short while later AP 1 reports frame loss and disconnect the client. So it is were the client has not gratefully disconnected from.the original.AP. 

 

This is not all of the frame loss but if you filter a single client you will see association to new AP often come before frame loss to old. 

 

What ever is causing the increased roaming it is a nightmare. We see clients with good strength and SNR roaming while the user is sitting still interrupting calls. 

 

And visibility in the timeouts / thresholds and visibility to trouble shoot is not good. 

WTorres
Getting noticed

I completely agree with @DevilWAH 

 

The issue was originally reported from Chromebook devices disconnecting or reporting poor client experience, eventually all devices with no exceptions were reporting the same anomalies.


I also noticed AP performance was poor ranging from 90 to 17% 

 

I rolled back to 28.7 with the help of TAC last night and can confirm that all my client devices are back to normal this morning.

 

As stated by @Schwa87 this rollback may not be for everyone.

 

DaryllHorner
Here to help

We have the same, we've started noticing it a lot due to change of our zero trust solution, when users 'lose' connectivity they see the disconnection pop up.  I say lose, I've had it myself around 5 times today but we aren't seeing timeline showing disassociations in client timeline view, and the client itself doesn't show a disconnection (checked using netsh wlan show wlanreport.)

 

It would appear it's a loss of Layer2 and above, the radio is up but no data across it for a brief moment which is tearing things down.  Here's my own laptop for past 24 hours where I've had multiple drops but as far as Meraki eventlog and timeline is concerned all is ok

 

DaryllHorner_0-1676049223575.png

 

 

 



We're around 99% confident rolling back to 28.X fixes on, on two smaller test networks, one with about 20 devices on firmware 29.5 on Monday had around 60 excess frame loss messages, but also a lot of "unspecified reason".  Rolling back to 28.5.1 (for some reason 28.7 isn't available to us) that went to 3 for the entire day, which I guess will happen at times if users move far away from an AP with nothing to roam to

 

We've tried client balancing on and off, with it on we see very erratic roaming, my boss was on 4 different APs in about 30 minutes, bouncing between two of them constantly.  At one point, physically his laptop had 4 APs in a line from him, and it was associating with the furthest away at around 150 foot, but there was one 20 feet from him.  Around 250 clients in building with 24 APs, 16 of them in the office areas in that example

 

We have a mixture of models and site sizes, MR33, MR42, MR52 and MR44 (MR44 being the bulk) - 15 users, 30 users, 250 above, doesn't make a difference

 
We see the issue across the all sites and sizes, personally us (the internal IT) have the issue, all Windows devices, to be specific Dell Latitude 7300/10/20/30 and 5420.  But we see it for MacOS devices, Intel and Apple models, iOS, Android and some static devices such as HDMI sticks running display boards

 

Planning to revert back Monday evening globally.  Will update on what we experience in the few days afterwards, it will be pretty apparent to us if it fixes the loss of connectivity.  Shame, the client balancing change in 29.X sounds nifty

Kolt601
Conversationalist

I'd love to know how you rollback goes. How larger of a global deployment do you have? Rough number of networks and APs?

Unfortunately, we dont have the option to roll back other than one or maybe two very small test sites. One of those would only have 1 client at best the other maybe 4-5 clients but only 1 access point.  Any of you that have worked with Meraki TAC do you feel like they are addressing the issue? Have you escalated to any Meraki SE contact? 

DaryllHorner
Here to help

Aye no problem Colt, I'll put in here over the course of next week how it goes.  We have 91 MR devices in total, 28 remote sites with 1 AP each, and 9x networks with multiple APs, ranging from 4 APs, up to the largest I'm in at 24

 

Spotted something esle, if it helps anyone, on phone so cant get screenshots. 

 

My laptops has dropped  7 times today, but according to first picture of client timeline, been associated since 2am.  However, spotted a colleagues machine had "multiple dhcp server" messages, sure enough checked and last time I saw that error, their ZTNA shows that being the connected time.  Look at my client, and I have that client event of multiple dhcp servers detected 7 times today.  So as far as I can tell, excess frame error, or multiple dhcp servers detected in client event log is a client dropping.  At least its another symptom to check against after the rollback to verify the effects

 

 

 

 

 

 

 

 

DaryllHorner
Here to help

Meraki support downgraded 3x networks for us to 28.7 yesterday at around midday, since then I haven't seen a single excess frame loss, multiple dhcp detection or unspecified reason for 802.11 disassocation event logs; but these networks are devoid of human life over the weekend

 

But I was seeing them before that since Friday evening up to the downgrade yesterday at noon, sporadically, for clients that had been left onsite or parts of the instrastructure on wireless connectivity.  See how today goes when it's business as usual, if looking good I've got the nod internally to pull the trigger on everything, just needs to go through officially in our change board late this afternoon

DaryllHorner
Here to help

We migrated all clients to 28.7.0 on Monday evening, so far it seems to be a major improvement in stability

I've just checked 20+ IT devices in client view as a test base, and can see in client timeline they are coming online start of day, and not showing any reconnection events

 

We've had some drop outs for ZTNA service but I believe that is WAN related, a few devices that remain in office have been connected for 72 hours solid now

 

To confirm:

  • We have client balancing OFF
  • We have band steering ON
  • We are using MR44 generally, certainly in the bulk of our office spaces where we see this
  • Someone mentioned it below, none of the clients are on UNII-3 channels, highest I see is 136 on 5ghz, 40mhz channel

 

Monitoring for a week and will report back how we get on

AxL1971
Building a reputation

if roll back of firmware resolves th issue, are others doing the same

 

I am reluctant to do so and assume Meraki are aware of this issue in the current firmware and plan to release another firmware to resolve this.

MarioFishery
Conversationalist

I agree. We only turned off Client Roaming and it seems to have stabilized the disconnection issue. It still takes around 20secods for clients to connect to the WiFi when they move between floors.

aabraham
Here to help

wait how do you turn off client roaming? Do you mean client load balancing or client steering? To my understanding that the new version client balancing released with MR29.x is active.The new active client balancing technique has been introduced in MR 29.X firmware, where an AP would try to steer already associated 802.11v-capable clients to a better AP (if one is available) using BSS-TM frames.

Screen Shot 2023-02-13 at 1.23.17 PM.png

When I read this I feel like this might be related to the issues we are seeing. So If I understand what you are saying is that you allow the clients to drop connections between floors and reassociate when they arrive on another floor. This makes me believe that you are doing a layer 2 roam between broadcast domains and might be turning off layer 3 roaming. I did the same because layer 3 roaming in Meraki is a complete waste of time.

MarioFishery
Conversationalist

Yes. Client load balancing is turned off on our RF Profiles. We are monitoring it today and will then wait for Meraki to release a patch(hopefully soon). If we don't have success then will wait for feedback on the rollback to 28.7 and then rollback.

DaryllHorner
Here to help

Mario, in our experience, client balancing on resulted in very erratic behaviour and clients roaming unnecessarily, and to very poor choice APs.  But not disconnecting as per 1st issue where we see excess frame loss, dhcp detection and deauth generally

 

We are rolling back to 28.7 now, will update late tomorrow on how it has gone

 

In honesty, 28.7 networks I've seen some excess frame errors, but users do move around.  Compared to 28.5.1 where I barely saw any, but it's difficult to see the forest for the trees.  It's a real event log as far as I see it can genuinely happen if a client moves to a poor signal area, so have to use some interpretation

 

Network we (IT) are on is currently processing 28.7, and we should have a yes/no it fixed it or not by late afternoon tomorrow

Eric3
Comes here often

Hi All,

 

I've read all the discussions and i saw exactly the same disconnection issues with 29.4.1 on bridged and guest ssid.
We're using MR44 and MR76's in a large environment and only using 5Ghz. I see laptop devices with a static positition and a  very good connection SNR around 38 and 50 dB dropping suddenly traffic all the time.

Because of that i updated the Meraki firmware 2 weeks ago with version 29.5.1
In the beginning it seemed that the update works, but now after 2 weeks the users complaining again and see many excess frame loss with all different device vendors.
I am curious to see the results from you guys downgraded firmware with version 28.7 after 2 weeks testing.




aabraham
Here to help

 A week ago we upgraded to 29.5.1 but last week we had a low number of users come into the office and experienced none of the sudden traffic drops but still saw the excess frame loss messages in the event logs. I suspect this week might have more traffic and I can determine the affect of the upgrade but I am not feeling hopeful. 

aabraham
Here to help

Upgrading to 29.5.1 did not resolve the issue. I feel that whatever this issue is it started with 29.4 though I never really upgraded to this and went from 29.8.7 to 29.5. At the time I was trying to fix issues I was having with layer 3 roaming so I was not thinking about this issue specifically. When layer 3 roaming still seemed to be a problem I removed it from my environment completely. That's when I started noticing the issue that was probably already there hiding behind my hatred of layer 3 roaming. 

I am thinking of returning to 28.7.1 now. I really feel this has something to due with the new Client Balancing feature that was launched with 29.1. I upgraded off 28.7.1 because I thought that layer 3 roaming might improve so now that it is removed maybe it will stabilize the issue. What I do not understand is why Meraki refuses to admit this is a real issue.

Quiro
New here

Check DFS events.

If your channels are not set to auto, check to see if the AP's are operating on the channels they are actually set to work.

MR46 has a "known bug" where AP's changing channels due to DFS don't return to the original channel therefore they may cause co channel interference.

We are dealing with the same problem and waiting for Meraki go fix the bug. Rebooting the AP corrects the problem.

aabraham
Here to help

can you elaborate more about what you are seeing with DFS events? were you able to capture any of this in monitor mode? I always see some DFS event being right next to a large river but I am not certain of the affects on users. are you using auto channel or static? my understanding is that auto channel rules should apply after the DFS event.

Quiro
New here

I did a packet capture once only when a client reported login problems at a particular spot in the building which sounded unusual to us.

There I noticed that the device was trying to link to an AP working on a channel we never use (153). We need all the channels there are due to high density, but 153 and 149 are used by apple tvs so we avoid them. We have lots of Apple TVs in a school setting.

To check DFS events you can go to meraki dashboard, Network-wide, Event log, Event type (include). There you filter out the event-type you need (DFS). A problem that can happen here is that if the event happened too long ago, then the history will return nothing. But the catch is if you compare what channel the AP is actually working on versus the channel you have set it if you have set it manually. You can open two pages, one with the APs that includes the Channels Column and the other with the channel settings and look for mismatches. By the way, I use the old version. Our engineer is preparing a script to do it automatically but has found some limitations with Meraki API at this point.

aabraham
Here to help

We finally pulled the trigger and asked Meraki to roll us back to 28.7.1. Now we sit and wait. Today is one of the busier says of the week with people moving for meeting room to meeting room so lots of roaming and chances to end up in excess frame loss wormhole.

 

MarioFishery
Conversationalist

Let us know how 28.7.1 goes for you. We are still on 29.5.1 and have MR 33,MR 36, MR 46, MR 55, MR 56, MR 86 on our network. With turning off Client balancing things have been a bit stable BUT still few users complaint of disconnections. Apple TV still facing issues in some areas but still not consistent. Looking forward to your feedback on 28.7.1

Bharadwaj
Comes here often

Did the rollback help and are client getting associated faster now? Pls let us know.

aabraham
Here to help

I feel like rolling back to MR28.7.1 was successful but need to monitor some more. Today was not as populated as last week but enough to say that we did not have any user go into the excess frame loss sunken place. I also stopped getting some awful association issues with the phones on the network. I still see the excess frame loss messages in event logs but like I stated before this is not an indication that there was an issue. Overall I am seeing way less disassociations due to excess frame loss. If you want to roll back to this version you will need to call in to the TAC and have them schedule it. I would say read the release notes first. I upgraded from MR28.7.1 to avoid layer 3 issues with sticky anchors so I felt confident with the downgrade since I removed layer 3 routing from my environment. Good luck!

 

 

AxL1971
Building a reputation

are Meraki actually aware of this issue with the current firmware (as people who roll back and commenting issues seems to be resolved). Are they releasing a new firmware to resolve this

 

I have logged a case with Meraki and pointed to this thread

MarioFishery
Conversationalist

Our issue only seems to be with Apple TV's disconnecting. Rest all seems fine for us after turning off clint balancing

MarioFishery
Conversationalist

We enabled all services on Bonjour forwarding Bridge mode and layer 3 roaming only. We had restricted it only to AirPlay and Apple TV's seems to be working fine now on the latest OS 29.5.1.

 

Client balancing is still Turned off.

smartinez790
New here

Also experiencing this with MR57s. Worked with support and got a ton of packet captures but no help from Meraki.

We are rolling back to 28.7.1 this weekend. I updated our case with a link to this thread with the hope that it gets some attention from the Meraki team.

DaryllHorner
Here to help

Whats your case number, I am going to open one Monday and will reference it.

 

Havent had time to open one given disruption, and have found Meraki support to be quite cumbersome at times so didnt want to get bogged down

 

 

 

Coltrain
Comes here often

Hi everyone, after upgrading the MRs to 29.4.1, we had the same issue (disconnects due to excess frame loss) rolled back to 28.7.1 and everything is fine again. We opened a call with Meraki support, but the engineer wasn't aware of this issue. We mostly use Dell computers, but we also have a couple of HP laptops. Since only the HP users complained, it could be this issue only happens with certain types of wifi cards. 

ESMichal
Here to help

Hey there. Similar situation here. We reached out to Support but L1 informed us about no known issues with 29.4.1 firmware. I asked for escalation of our case. If I don't hear back from them soon we plan to do the downgrade as well. Bad thing is one can not troubleshoot the issue after a downgrade.

Coltrain
Comes here often

Hi ESMichal,

 

May I ask what brand of laptops you use as we only saw this issue on our HP laptops. Thanks!

ESMichal
Here to help

MacBooks. I have reports from people using Apple Silicon devices with macOS 13 installed.
Apple Silicon Macs support all the fancy Wi-Fi roaming features including the 802.11v so it could be a factor.

Coltrain
Comes here often

Thanks @ESMichal, according to this thread 29.4.1 Radius Issues some clients unable to connect to MR-53 AP's - The Meraki Community a workaround is to disable 802.11w or downgrade to an earlier firmware. This thread also mentions that the issue is known to Meraki and they are working on a fix (no ETA yet). For now, we'll just stick to the 28.7 firmware. 

WB
Building a reputation

Anyone had any further luck rolling back to v28.x to stabilise this issue? Feels like we're hoping v29.6 comes along soon with a fix..?

MarioFishery
Conversationalist

We are still on 29.5.1 and have allowed some rules on the firewall to allow traffic to the Meraki cloud. So far roaming is not an issue and the wireless is stable.

WB
Building a reputation

Were these any additional rules beyond the required 7351/7734/7752 + 443 to get to Meraki Dashboard?

MarioFishery
Conversationalist

We use Fortinet as our Firewall and we added the Cisco-Meraki Cloud services rule without inspection. It has reduced the timeout for authentication.

Kolt601
Conversationalist

We are on 28.5 and have the issues.  Currently pending firmware upgrade to 28.7 or 29.5.1 depending on downtime windows which are a huge problem in our environment.  

 

It's really bugging me the number of peoples sites that seem to have this problem and no real response from Meraki.  Also given that this issue seems to have been going on for quite a while, I would have expected it to be at least acknowledged by TAC. And that the minute I engage TAC since 28.5 is marked as 'critical' I get the canned "Upgrade the firmware" response is pretty annoying, as I point out the release notes do not contain any mention of this issue or an issue similar. 

GiacomoS
Meraki Employee
Meraki Employee

Hey awesome people of the Community,

 

Thank you for the healthy discussions in this thread. I'm sorry so many of you have been encountering these issues and I can see that there's quite a bit of impact for your end users as well. 

We seem to have quite a few different things in the pot here:

- Firmware related changes in behaviour.

- The "Excessive frame loss" event, which has replaced the old and not-very-informative "Unknown reasons".

- Inconsistencies with Client Balancing being enabled.

- Maybe some end client driver related events.

 

Wireless issues can be quite lengthy at times as they require a lot of live data capturing in order to intercept the problem when it happens. I would recommend to have monitor mode packet captures whenever is possible, and at times the results of a site survey can also assist in understanding the environmental circumstances better. I know not all of your working arrangements may allow this, but if you can assist Support with the data it can certainly go a long way to help. 

 

I would recommend to continue to work with Support on the matter, as it sounds like some problems may be software related and we can keep track of the impact and others may still need to be investigated more thoroughly to get to a more appropriate root cause. 

 

I'll keep an eye out here, please do continue to update about the various instances you are encountering and how you are progressing with the Support team. 

 

Many thanks everyone!

Giac

 

 

Please keep in mind that what I post here is my personal knowledge and opinion. Don't take anything I say for the Holy Grail, but try and see!
Appreciate who helps and be respectful of every opinion and every solution offered.
Share the love, especially the Meraki one!
aabraham
Here to help

I think there have been a lot of cases opened regarding this issue and no one is seeing any real help other than upgrade, which was actually the wrong thing to do. I have sent several packet captures in monitoring mode and from the APs but no one was able to help from support. The only real help came from this thread. So suggesting that we open a case and work with support is not real helpful at all. Now I do not know when i could actually upgrade again because MEraki seems absolutely incapable of clearly identifying the issue that only started once we upgraded to MR29 code in high density environment. And what was introduced with this code? New form of client balancing? Interesting.

Gary_Rowe
Here to help

We received a reply from support earlier this month,

 

"As this is a known issue our development team continues to look into this issue but there is no current timeline as to whether this issue will be resolved in the upcoming MR firmware stable, however the dev team is aware of this issue and looking into a resolution."

 

So, it does look like they are aware of the issue and doing something about it. We have restored back to version 28.6 and have had zero issues since. There have been some good recommendations within this thread as to adjustments to make to allow the current version to fit your needs, if you need to take advantage of any of the new features in 29.x.

Tamas88
New here

We spent around a hundred hours on this issue so my summary below in case it helps someone

 

symptom: users Teams call would randomly disconnect for 30 seconds, seemed worse for high density sites/areas
issue/troubleshooting done: radius re-authentications using EAP-TLS are prolonged because the access-accept is significantly (~20-30s) delayed from AP to ISE. The network path was proven clean with packet captures on every single hop, so the issue had to happen from client to AP or on the AP itself. From our troubleshooting, it is not related to a specific family of drivers, AP models or even Meraki software version, although 29.x makes it much worse if you are using client balancing as it is active now (you could see idle clients during the middle of the night roaming every 5 minutes between APs with excellent signal strength). Rolling back to 28 and disabling client balancing helped, because the number of re-authentications dropped, but the original issue was still not completely gone and as the users became hyper sensitive to Teams drops, we could still see long authentications during occasional roaming events.

 

Unfortunately my experience with support was the same as everyone above so I decided to build a home lab with two MRs and actually managed to reproduce the issue consistently, using my home laptop in between as sniffer (I had a hard time explaining why I am running around the house with my corp laptop for hours trying to roam)

After countless packet captures and research on mrncciew for us the issue was completely mitigated by turning off Radius CoA on the SSID. I even upgraded two regions back to 29.5.1 without issues for weeks now. Unfortunately I still don't have a clear picture why, I also gave up on chasing Meraki support to help me understand it. Although Meraki is awesome, these kind of complex issues really highlight the limitations of the product and the lack of meaningful communication/proactivity from support is making it much much worse. We will be moving back to Cisco Aironet in the future.

summary:
- 802.11r disabled
- 802.11w disabled

- rolled back to 28, better but still not prefect

- client balancing disabled
- Radius CoA disabled, issue free since, I can't even reproduce it anymore, previously I could trigger a Teams drop pretty much anytime I wanted with roaming. Now roaming is only a >1s short blip, even voice/video is uninterrupted
- upgraded again to 29.5.1, still no issues

 

 

AdBurl
Getting noticed

Thank you for your time spent trouble shooting, and such a detailed write up.  
Out of interest do you see less aggressive roaming after your change or just a quicker roam with no impact?

DevilWAH
Here to help

After disabling 802.11r, 802.11W and load balancing we saw a lots less. But I believe this is because prior to 29.5 load balancing only happens during associating, where as post 29.5 it active while clients are connected. So we have left load balancing disabled so we still see clients roam when they need to, but significantly less than when it was enabled. The new software seems really aggressive when it is enabled. 

 

One other thing we seemed to notice is the CoA issues seemed to affect clients authenticating with EAP-TLS more than PEAP. Talking to meraki support there were saying when CoA is enabled the client always needs to go through a full auth process when it roams, and every packet must be seen before it can continue. 

 

What we took from this is you can't at the moment use CoA on a WLAN using enterprise (802.1x) for authentication as the results are unpredictable. In most cases this will not be an issue. 

 

For us this seemed to be a combination of 3 factors 

 

1. 29.5 introduces active load balancing where when clients are already connected it will attempt to "encourage" them to move to better AP's. This can be remediated to an extent by disabling load balancing on the WLAN to return it to pre 29.5 behaviour. 

2. CoA can cause long authentication delays so if you don't need it disable it, partially on WLAN using 802.1x based authentication. I think this was always a problem we had in the background, but the aggressive roaming caused by 29.5 highlighted it. 

3. Lastly when the client roamed due to load balancing it was not cleanly disassociating from the old AP and this is where the "excessive frame loss" error comes from. looking at the logs you see the client connect to the new AP and then a few minutes later a error from excessive frame loss and getting disassociated from the original AP. This seems to happen when it roams due to load balancing.I don't know if this has any thing to do with the issue if meraki is seeing the client associated with the old AP when attempting authentication to the new one? 

 

Another issue we found it the lack of ability with Meraki hardware to troubleshoot this, we were asked to capture packets but for this required a linux based device or setting up a "spare" meraki AP to capture packets over the air.  In the end we deployed some cisco aironet AP in sniffer mode to do it. 

 

Thankfully our company is large enough we have spare AP's that we can set up a LAB with or deploy for testing, but without this the logs and data we could get from meraki was very limited. And even after uploading over the air packet captures to Meraki support in the end it came down to working it out ourselves. I think over the few months we were looking at this i spoke to 4 or 5 meraki engineers and each time had to start from scratch explaining it and being told the same basic troubleshooting steps made for a very frustrating experience. 

RaphaelL
Kind of a big deal
Kind of a big deal

After upgrading to 29.X ( I tried every single version of 29.X ) we have a lot of these errors : 

 

802.11 disassociationclient not responding
MERAKI REASON (CODE 103) EAPoL invalid MIC
 
SSID is 2.4/5Ghz , 802.11R Enabled , CoA Enabled , Client Balacing Enabled. AP : MR33/MR36
 
One thing to consider is that in theory you can't enable 802.11R AND CoA ( as per documentation : https://documentation.meraki.com/MR/Encryption_and_Authentication/Change_of_Authorization_with_RADIU...       Fast roaming mechanisms like PMKsa, OKC, and 802.11r will be disabled on the SSID that is configured for CoA.
 
My symptoms are different than yours , but I feel that the underlaying issue might be similar.
RaphaelL
Kind of a big deal
Kind of a big deal

802.11r Enabled , CoA disabled 

 

Still seeing : auth_mode='wpa2-802.1x' radius_proto='ipv4' radius_ip='x' reason='bad_password' reassoc='1

 

Rollback to MR28 , all these ''issues'' are gone.  @DevilWAH  are you seeing these errors on your network upgraded to MR29.X ? On ssid's WPA2-.1X

 

At the moment , its pretty hard to gather usefull info to help support.

 

Thx !

sclinton13
Here to help

What NICs are in use in your networks? We believe we have isolated our issues to the client side, with the Intel NIC being able to successfully perform functions related to the client side of 802.11v.

 

All of our users have Intel AX201/AX211 (only NIC I have seen mentioned in this thread as well), which were fine prior to 29.x code, but have since had issues sporadically and just a small subset. Being that 29.x was the first time that the client was actively involved in the client load balancing, we believe this to be the root cause, but still investigating.


Our deployment is high density, so client load balancing is preferred, so we will have the endpoint team push to disable 802.11v on the laptops, which will revert the Meraki AP to passive only, as it was in Pre 29.x code. This seems to me, like a better approach, then limiting a necessary feature for high density.

RaphaelL
Kind of a big deal
Kind of a big deal

We are also using AX201 cards. 

 

I though you couldn't desactivate 802.11r , 802.11k and 802.11v on these cards

sclinton13
Here to help

You seem to be correct from what I am finding so far, but we have a meeting with Intel next week and I will ask if there is a way to disable, maybe via powershell?

sclinton13
Here to help

Update for my situation, we updated multiple users Intel AX2xx cards to the latest firmware and has fixed users we have rolled out to. So while MR29.x code may have exposed the issue with the client load balancing / 802.11v, the root cause was the Intel NICs and we have run both 29.5 and 29.5.1 without issue.

 

So if you have the Intel AX2xx (AX201/AX211/etc), that is your real issue.

RaphaelL
Kind of a big deal
Kind of a big deal

What version were you using ? 22.XX and now the latest 22.200.2 ?

sclinton13
Here to help

Happening on the following versions:

22.170.0.3 2022/11/17
22.150.0.3 2022/07/20
22.120.0.3 2022/03/17

DBlum
Getting noticed

Do you know what driver version you went with so we can do some testing? thank you

sclinton13
Here to help

Yes, we are on 22.220.0

 

LINK: https://www.intel.com/content/www/us/en/download/19351/windows-10-and-windows-11-wi-fi-drivers-for-i...

 

I have client load balancing enabled, 802.11r, and CoA disabled (since I am using 802.11r)

CiscoAl
Comes here often

I would be really interested in the output of that meeting, is there a way to manually disabled 11k, 11v on Intel cards?

kapplejacks
Here to help

We saw the same issue with ISE, where when the client roamed ISE would send a new redirect URL and this caused an ISE session DB duplicate entry resulting the the de-authentication from the network while still be associated to the SSID. Changing the ISE Posture from on every connection to 1 time per day might have helped this.

 

Unfortunately we see the issue on the guest wireless that's just using PSK, clients while stationary continue to roam (in some case to AP's with worse SNR). On laptops changing the roaming aggressiveness to a lower setting 3 - 2 or 1, helped but you cant do that on iPhones or other mobile devices. Unfortunately meraki support said its up to the client to connect and they have no real backend settings adjust options that the firewall has, similar to the Cisco WLC on client channel persuasion. This is interesting because we have had 0 problems on our old 2504 WLC. Its a shame because our sales rep said the meraki's would be better than WLC 9800-L but after many moths with limited help WLC9800-L seems to be the best fix, tossing the meraki in the bin.

 

It did not help that meraki support suggested using channels for testing in known DFS range. Also got very conflicting reports on what bit rate, TX power and channel settings to use. Due to the flat nature of meraki support, these complex issues are near impossible to resolve.

sclinton13
Here to help

I have been supporting Meraki wireless (as well as all other products) for better part of a decade and have not run into issues that are too complex or can't be diagnosed. They give you all the data and tools you need, but most people are unfamiliar with the interface and looking for features or names of features, from another product set. 

As far as support, its a game of change for anyone - if you break it down to what it is, it is a call center for all these vendors and it's luck of the draw on their ability to tshoot or level of support knowledge. I have gotten some really great engineers and I have some that were not, but in the end the info is for all to see.

 

This particular case, mine ended up being an issue with bad firmware on the Intel Wireless NIC (AX201/AX211), that appeared to be a Meraki upgrade issue, but ended up being the NIC after using all troubleshooting and data available. There are something like 2.5+ million Meraki devices installed out there, I wouldn't say there is a limitation there - Meraki is installed at multiple stadiums, theme parks, etc. This is top tier enterprise gear, but sometimes the tshoot will be hard for anyone or in this case, for me, ended up being a root cause with the client NIC's firmware - just keep pounding away an working with all vendors in the path - in your case Laptop, ISE agent (if applicable) APs, ISE/RADIUS, AD, etc).

 

If you have roaming for users standing still, this is normal with active load balancing like 802.11v, which is a client to AP communication and initiates the roam function on the client themselves. DFS is not recommended, but not prohibited - it depends on your location and the density of your deployment, plenty of high density deployments will require 20Mhz channel widths and needing all available channels across the spectrum,. If you are near hospitals or airports, best to avoid, but your event logs should tell you if you are hitting an DFS events. 

WB
Building a reputation

It would be amazing if Cisco could port some of their Intel Connectivity Analytics work over to Meraki, even if only to be able to identify where client drivers are out of date!

AxL1971
Building a reputation

thanks for the detailed analysis

 

any impact if disabling CoA

DevilWAH
Here to help

That depends what you are doing. 

 

For example if you have  visitor / guest flow with a captive portal then you need CoA to reset the session after the user has signed on, but this is only normally used with MAB and the long auth inssue down not seem to affect this. 

 

Also if you are using profiling in ISE and you want a way for ISE to instruct the AP to force clients to reauth you need to use CoA. 

 

but if you just doing straight 802.1x auth to a radius server it wont make any difference to disable it. 

 

 

DarNic
Conversationalist

Out of 114 sites upgraded to 29.4.1 we had two sites with the problem due to loss frames and in the event viewer you can see frequent changes to different AP's but the client were not roaming. We down graded two sites that had these issues which fixed the problem. 112 sites are running fine on 29.4.1. 

 

Has anyone on this community with an active case open with Meraki, we have but dont seem to be getting any closer to the problem. 

DevilWAH
Here to help

This is typical symptoms, they are roaming but not due to signal, with 29.4 clients are actively "pushed" by the AP. The event log does not show it correctly as it shows the client both connected to the new and old AP for a time due to the clients not cleanly roaming. Sites with high client density and or over lapping AP's are more likely to see it happening than smaller sites with lower density. 

 

Check out "Client Balancing Behaviour - MR 29.X and Newer Firmware" from the following meraki doc. which is where 802.11v was enabled. 

 

https://documentation.meraki.com/MR/Other_Topics/Client_Balancing

 

In brief now the AP tries to tell the client to move to a new AP to better balance clients across AP's, before this never happened with a active connection. in passive load balancing the AP delays allowing the Client to associate with a highly utilised AP in the hope it will associated with a different one with less clients. 

 

post release 29.x once associated the AP send a list of potential better AP to the client in the "hope" it will move across (its always client that makes the decision) if you have a client with NIC / drivers that always "listens" to the AP it will try to switch over. This normal would not have a negitive impact if 

 

1. The client authenticates with out issue or delay

2. the new AP does not immediately tell the client to move back. 

 

From out experience in some offices how 802.11v has been applied, AP density, utilisation and client density  means some client  will constantly move between AP even when stationary every few minutes. 

 

Again not an issue in it self, but of you have any compounding issues with authentication or association being delayed users are going to notice it and it will have negative effect on performance. 

 

Also yes we do have a case with meraki, it has been escalated and while we have resolve our issues we are still working with them to provide some more detailed captures and logs for them to look in to. 

DarNic
Conversationalist

Thanks for the detailed response. Would you mind sharing your Meraki case number so we can point our Meraki SE to look at that.

DarNic
Conversationalist

Also did you fix your environment by upgrading the wifi client driver to support 802.11v

DevilWAH
Here to help

No, we have 70,000+ clients so not possible to ensure every things i upgrades or that manufactures have drivers to support it. So we disabled 802.11v on the networks. If you have a good AP placement design it should take into account client density and if you also then set the MDR and roaming setting to encourage clients to move to the best AP (ie don't leave MDR at 1 or 6) clients should be distributed evenly across them. 

 

I think this is one of those areas where a plug n play solution like meraki falls down a bit compared to something like cisco aironet where you can tune how 802.11v operates and how aggressive it is. 

 

Our view is that if without load balancing turned on we have a resonaly distribution of client to AP then we dont worry 

DarNic
Conversationalist

Thanks for the information, reading the whole thread on version 29.x it feels bad that we have to retro fit our setup to make 29.x work with no issues. I hope Meraki find a solution to this soon. Currently the two sites that had issues were rolled back to version 28.x and work fine. They are high density environments with lots of AP's and Clients.

sclinton13
Here to help

I avoided rolling back at all costs, worked on troubleshooting the root cause and found it to not be an issue with Meraki at all. The issue for me, was Intel AX2xx cards, MR29.x kicks off active load balancing using 802.11v, which requires the client device to do a fair amount of the work and initiating the actual moves. The issue seems to be malformed packets, that affect any type of authentication, randomly. It affects 802.11t PMKsa and 802.1x full auth - all at random. The issue will start when the AP tells the client decide to move to a better AP, which the client using 802.11v has a list of all neighboring APs, then goes down that entire list trying to authenticate over and over for 1-2 min, then the bug clears and a full 802.1x auth is done and the user will be back on.

 

we had multiple chronic issues with specific users and all we have applied the newest Intel firmware have been fixed and actively load balancing without issue.

 

hope this is helpful to others 

RaphaelL
Kind of a big deal
Kind of a big deal

I'm pretty sure that this thread is about 2 different issues. Yours is different than the OP , but is identical to mine.

 

Intel AX cards are 'randomly' suffering packet loss upon roaming / reassoc. Goes to a full 802.1X auth. Disabling 802.11r fixes the issue. Downgrading fixes the issue. Upgrading the AX drivers to the latest 22.200.2 didn't fix anything. 

 

Are we having the same issues ?

sclinton13
Here to help

Yes, we are on 22.220.0

 

LINK: https://www.intel.com/content/www/us/en/download/19351/windows-10-and-windows-11-wi-fi-drivers-for-i...

 

I have client load balancing still enabled, 802.11r enabled, and CoA disabled (since I am using 802.11r). Our desktop team is fixing users as they report the issue for now, about 5 users, who were top issues a month ago are now issues free for a month.

sclinton13
Here to help

Great comments here, We seem to have a similar issue, but we have really been keying in on the PC NIC’s, we have found data related to issues with Intel AX201/AX211 NICs and issues with them and 802.1X auth sporadically. Enabling 802.11r seems to have quelled some of the issues, but we have we have only ever seen it affect a subset of users. We have a few hundred users, in our HQ with 27 MR42 APs. We only see a handful of users with the issues and pretty consistently (5-10 users).


Is this similar for anyone else? 

* intel NIC - AX201/AX211

* small subset of users

 

I am curious if you are getting to your desired results with downgrades and disabling of advanced features, but covering up a possible larger issue with the client NICs. When active client load balancing is requiring your devices to authenticate more and disabling reduces this amount of attempts and possible incidents hitting the issue - but in reality should be viable, however the NICs can’t keep up?

CiscoAl
Comes here often

Reading this entire thread we have experienced very similar issues which has been broken down and resolved in nearly all clients now that were experiencing similar issues to the ones discussed in this thread.

 

*Realtek chipsets on some laptops required driver upgrades, these devices would disconnect instead of roaming correctly. This is a known issues apparently with Realtek NIC's.

 

*IntelAX2xx drivers required updating, as they were experiencing "roaming on the spot" issues with tens of associations over and over to the same AP. 

 

*We disabled 802.11v aka "Client Balancing" in Radio settings on the SSID's.

 

*We had Meraki disable 802.11k BSS Transsision, this can only be done by Meraki and a support case will be required.

 

Rolling back firmware was not an option for us. The above changes have given us stability for most clients now on version 29.6.1 and we continue to work on additional data capture to resolves issues with the last of our clients with our customers.

RaphaelL
Kind of a big deal
Kind of a big deal

Hi everyone. 

 

MR29.6 is out and seems to be fixing a lot of issues with Client balancing and other symptoms shown here : 

 

  • General stability and performance improvements
  • Ethernet PHY optimizations (MR30H)
  • AP may stop sending MQTT updates
  • AP may send client balancing updates when disabled
  • AP may stop beaconing 5GHz BSSID after multiple DFS events (Wi-Fi 6 and Wi-Fi 6E APs)
  • AP may continue to send traffic to a client after it has roamed to a new AP
  • AP may reply to every LLDP packet from a PSE
  • AP may not broadcast setup SSIDs on factory firmware (Wi-Fi 6 and Wi-Fi 6E APs)
  • AP may reboot when the “Client Balancing” feature is enabled (Wi-Fi 6 and Wi-Fi 6E APs)
  • Numerous fixes to address MT connectivity at scale (Wi-Fi 6 and Wi-Fi 6E)

Might be worth trying !

Henrik_
Here to help

The release notes for MR 30.5 (beta) mentions "Intermittent connectivity loss for Windows client devices". Not sure if it addresses the problems in this thread but could possibly solve some of the issues. We will evaluate as soon as version 30 is in a stable state. 

 

A new beta wireless firmware is now available on Mon, 11 Sep 2023 - The Meraki Community

cPanel
New here

I dont see this available in the portal for install yet. do you?

RaphaelL
Kind of a big deal
Kind of a big deal

Indeed. And I don't know why

cruzetond
New here

Meraki support downgraded 3x networks for us to 28.7 yesterday at around midday, since then I haven't seen a single excess frame loss, multiple dhcp detection or unspecified reason for 802.11 disassocation event logs; but these networks are devoid of human life over the weekend

sclinton13
Here to help

Out of curiosity, what wireless NIC is being used on your laptops? I ask, because I found that use of 802.11v protocol, starting actively on MR29.X codes ended exposing an issue with Intel NICs. We saw success with all new codes and no requirements to disable any features. I am using 29.5.1 today and have 802.11r, active client load balancing, etc. I have zero issues with devices after upgrading the Intel NIC drivers.

AxL1971
Building a reputation

The latest Intel drivers for the AX201 wireless chipset has the following fix

 

When connected to certain wireless APs in 802.11ac or 802.11ax mode, network
connectivity loss (Windows System Event ID 5002) might occur after roaming

 

https://www.intel.com/content/www/us/en/download/19351/windows-10-and-windows-11-wi-fi-drivers-for-i...

LakesideLion
Getting noticed

We are a school that just this past summer converted to all Meraki equipment.  We have a high density environment with 196 MR57s scattered across 11 buildings and are seeing all the same things everyone has been seeing ( massive unexpected disassociations, constant roaming of devices that are standing still, repeated disconnects and inability to reconnect making it impossible to have zoom calls, constant eapol timeouts, repeated connections to an AP "for a few seconds").  

 

We have upgraded to 30.5 and it seems to have made dent into the connection problems we've been having.  It's not perfect by any means.  We still get some disconnections and looking at our connection log, there are constant authentication failures due to eapol timeouts.  Has anyone else tried 30.5?

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels