We are an environment with clinics utilizing multiple MR52 units, I have to say that the AP's originally were not placed according to heatmap surveys. We are in the process of deploying NAC authentication using radius service from Portnox in the cloud. We frequently see this error during the Authentication stage, especially WHEN ROAMING:
Client failed 802.1X authentication to the RADIUS server
Now, we were able to identify some interference issue with a specific clinic, that's situated in a residential area. So we enabled DFS channels to alleviate this issue since the clinic is not near any radar installations. We later also discovered Portnox' front end firewall doing IPS rejections when multiple frequent radius packets were sent their way across the internet. We worked with Portnox to identify this issue. After alleviating both of those issues, most of the authentication error went away, but we still occasionally receive the same error in the connection log in the Meraki dashboard. The NAC radius server utilizes an internal Portnox offered credential caching appliance, which re-caches every 8 hours. We have 1 single radius server entry in the NAC setup inside Meraki. We are at the point of trying to identify where potentially the packet is getting lost, however, this seems to be a monumental effort to identify that due to our Portnox cache server being in a hosted data center across the Meraki SD-WAN and we would need to capture at the end client, ports the AP's are connected to, air capture using an airpcap adapter, captures at the SD-WAN virtual interfaces at both ends of the SD-WAN leg, and at the far end with Portnox. It seems ridiculous to analyze several captures to find several lost packets.
With the above said, I'm curious to find out:
- have most you deployed 802.1X NAC authentication over wifi and did you experience issues?
- what solution are you using with little to no NAC related authentication error?
I haven't gotten any strange issues like that.
The deployments we are having though usually have a radius server on the LAN or at least a close SD WAN site close so the chance of lost packets is smaller.
>Client failed 802.1X authentication to the RADIUS server
You need to concentrate on the authentication area. Forget DFS, AP placement, roaming, etc. This means the AP got an ACCESS-REJECT from the RADIUS server. It was told to refuse to allow the user to connect.
Is the RADIUS server showing that it sent an ACCESS-REJECT for the user? If so, what reason has it given?
Portnox claims that sometimes they do not see the authentication request packet at all, which also ends up in the "Client failed 802.1X authentication to the RADIUS server" It's not always definitely an access-reject from the RADIUS server, according to Portnox NAC provider at least. As you hover over the "Client failed 802.1X authentication to the RADIUS server", you can see the explanation that "upstream latency" is a possible cause, which might also include packet loss. I personally have checked the primary fiber internet circuit and there is no issue at all.
BTW, I forgot to disclose, Portnox had us:
- disable 802.11R
- use 20MHz for channel width
Interference DOES matter because the 802.1X auth failure rate was much higher before we turned on the DFS channels due to nearby interference especially on channel 44. The interference was exacerbating the 802.1X auth issues.
>"Client failed 802.1X authentication to the RADIUS server"
When the RADIUS server doesn't respond you see a "RADIUS Server Timeout" message.
I think you are going to need a long-running packet capture of RADIUS traffic. Take a specific case, and then see what you saw on the wire. Did you send a request? If you did, did you see a reply. If you saw a reply, what it say (accept/reject).
Then you will be able to advance this a lot more.
I am using Portnox CLEAR, the cloud product of Portnox as well, and had no roaming issues. I would suspect this is indeed radio interference issues. The best would be to have Meraki support involved in debugging those kinds of issues, they are very helpful from my experience. They can help you pinpoint the issue...
Do you absolutely have 0 packet drop while roaming? We already eliminated the interference issue by turning on the DFS channels as stated above. We still get occasional "Client failed 802.1X authentication to the RADIUS server" error, just not as often.
Capturing such event is extremely hard because an end client might be mid authentication process while walking around and stops association with an AP and moves on to authenticate and associate with the next AP it seems if walking faster.
Almost no issue at all when sitting stationary.
I had no such issues.
I worked with Meraki Support with regards to RADIUS issues, they ran a packet capture on the APs, allowing only RADIUS packets, and all RADIUS packets were accounted for.
I am curious what exact capture Cisco ran for you. So you had problems also and that's why you also ran packet captures? If you didn't experience issues then I imagine you wouldn't get into the packet capture troubleshooting stage.
Because we ran various types of packet captures and the result was inconclusive and the tech basically got tired of me asking questions. He was questioning why the client was not responding mid 802.1X authentication, I said it's because the client disconnected that's why you see no response.
Packet captures we ran:
- packet capture on the client walking the space
- packet capture plugged into a mirrored switch port spanning all switch ports all AP's are plugged into
- 802.11 air packet capture using a Macbook with wifi adapter in monitor mode
Went back onsite yesterday to troubleshoot this with 2 guys walking the space, me doing air capture using Kali linux with an Alfa AWUS036CH high gain wifi NIC adapter listening/sniffing on channel 44 homed to a specific AP. And we did this along with both Cisco and Portnox on the phone. We actually could no longer reproduce the issue. Cisco performed the following captures to make sure that the RADIUS packets are making all the through from the remote site via the SD-WAN IPSEC tunnel/over the internet to our headend in our data center where the Portnox auth caching server sits:
- packet capture on the LAN interface of the remote network Meraki MX84
- packet capture on the SD-WAN virtual interface of the remote network Meraki MX84
- packet capture on the SD-WAN virtual interface of the headend network Meraki MX450 active node
All Radius packets accounted for. This is why I asking what type of packet captures were performed. We did captures at 4 different points.
Using the air capture of 802.11 packets via Kali Linux, I saw all 4 802.1X packets during authentication.
Conclusion of the underlying problems at this remote site that caused 802.1X issues were:
- co-channel sharing when originally using 80MHz channel width among 9 different AP units.
- serious wifi interference caused by this location next to residential area, had to turn on DFS channels for AP's to self adjust into those channels.
- client balancing was turned on, caused proactive rejection of client re-association requests
- 802.11r had to be turned off, Portnox' NAC solution is not so ready to work with 802.11r yet