Network Access Policy stops working and nobody knows why

Gabriel_Page
Conversationalist

Network Access Policy stops working and nobody knows why

Recently we configured our access ports to apply a network access policy. 

 

All our switches are in the latest stable firmware version MS17.2.1

 

We noticed that if we lose connectivity for a few seconds against our Radius servers, all our ports stops sending request to the Radius servers and in the event log we can see, so this users goes directly to the Authentication Critical Failure vlan, skipping the 802.1X process. 

 

802.1X critical auth VLAN  port:34, old_state: auth_critical, new_state:auth_closed

 

We opened a case with Cisco Meraki and they recommended us to enable a feature called Radius Monitoring to recover the service when this Radius servers comes up again... we tested this new feature in a lab with 3 users and it seems to be working fine, but when we apply the same Radius Monitoring in a full office with more than 300 users, this issue start happening again.

 

What can we do to avoid this issue?

Is there any configuration recommended in this cases?

4 Replies 4
alemabrahao
Kind of a big deal
Kind of a big deal

As recommended by Meraki Support, ensure that RADIUS Monitoring is enabled.

MX/MS RADIUS Troubleshooting - Cisco Meraki Documentation

 

Increase the timeout settings for RADIUS requests. This can help prevent the switch from prematurely marking the server as unreachable.

 

You can also monitor the RADIUS server logs and switch event logs to identify patterns or specific times when the issue occurs.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
Gabriel_Page
Conversationalist

We performed packet captures on the core and the access switches aren't sending packets to the Radius servers using the port 1812/udp ... but they are requesting EAP sessions with the connected clients.

 

We noticed that this clients are replying to the Switch EAP request, but this switch is not sending packets to the Radius server...  then the users get the auth_critical error in the event logs. 

The only way to recover from this error is rebooting the switch. 

PhilipDAth
Kind of a big deal
Kind of a big deal

What appears in the switch event logs when this happens?

What appears in the RADIUS logs when this happens?

Talhaturkdogan
Here to help

I think you are hitting known issues... We are having same problem with critical authentication. 

Meraki MS versions have same problem from MS16.9 version but this problem stiil continue and not resolve..

 

  • RADIUS communications may not recover after an initial failure when Critical Auth is enabled

 

We do not understand why such an important problem has not been solved yet..

Get notified when there are additional replies to this discussion.