MS120/802.1X - Weird Critical Auth Behavior

Crocker
A model citizen

MS120/802.1X - Weird Critical Auth Behavior

I've deployed 802.1X, backed by two Microsoft NPS servers, across our environment over the course of about a month. Mostly works exactly as expected; However, I've run into a couple odd issues regarding Critical Authentication.

 

In at least two instances, I've had switches go into Critical Auth even when the NPS servers are available. Additionally, while in Critical Auth, the Data VLAN fails open just like I want it to; However, the Voice VLAN doesn't appear to do so. The tell is that a workstation connected through a phone (Mitel series 5330e's, generally speaking) works just fine in this state, but the phone itself never pulls a DHCP address like it ought to. Removing the 802.1X access policy from the switchport gets the phone working again.

 

Any obvious noob traps come to mind?

15 Replies 15
RaphaelL
Kind of a big deal
Kind of a big deal

What MS version are you running ? I think there was issue with 802.1X auth on voice vlan prior to MS 14.33.X

Crocker
A model citizen

14.33 - That's a good call, hadn't even thought to check the 14.33.1 release notes. I (now) see a mention in there about NAC enhancements, is there anywhere I can look to see what that actually means?

RaphaelL
Kind of a big deal
Kind of a big deal

Unfortunately no , the release notes don't go that deep in the level of detail. You might have to contact support to see if your behavior is fixed in a specific release. I haven't used critical vlan yet , so I can't share any inside on that feature 😞 

KRobert
Head in the Cloud

MS 14.33.1 should be okay regarding 802.1x. Are you using a single VLAN for Data and Voice are you Data and Voice VLANs separate?

CMNO, CCNA R+S
Crocker
A model citizen

The switches in question are running 14.33, I'm gonna go ahead and bump them to 14.33.1 tonight and see if the weirdness goes away before taking this much further. Want to see if that has any impact on these things getting 'stuck' in critical auth, or if it at least fixes the issue with the Voice VLAN not failing open when in a crit auth state.

 

I'm using separate Data and Voice VLANs (ID's 10 and 20, respectively), though.

KRobert
Head in the Cloud

I'd open a case with support as well. It may be a good idea to run a packet capture while this is happening so that you can see the play-by-play of how the endpoint, switch, and server are communicating with each other. Possibly pin down where the issue is happening at. Also are you doing Multi-Domain, or Multi-Auth?

CMNO, CCNA R+S
GiacomoS
Meraki Employee
Meraki Employee

Hey team,

 

Thought I'd chime in here as well. 

@Crocker , critical auth is available on different versions based on the platform. If you have an MS390, I believe that you would need to be running 15.X.

Assuming you are not though, I checked our internal info; I can't see any specific fix on 14.33.1 for Critical auth, but we did introduce a couple of fixes on 14.33, which you mentioned you are already on. 

I'd follow KRobert's advice, grab a couple of packet captures, if possible get the out of band logs as well, so we can check if there's anything fishy happening whilst you attempt the authentication, and open a support case via Dashboard. 

 

The pcaps should theoretically indicate what is happening with both the RADIUS authentication and the VLAN in which subsequent messages are put in. It may be a good idea to grab a pcap also on the port where the traffic is destined to, so we can also confirm if the switch is... switching 🙂 

 

Let us know how you get on!

Giac

Please keep in mind that what I post here is my personal knowledge and opinion. Don't take anything I say for the Holy Grail, but try and see!
Appreciate who helps and be respectful of every opinion and every solution offered.
Share the love, especially the Meraki one!
Crocker
A model citizen

Yup, I've got an open case with Support at the moment. They grabbed what they could, but need some real-time examples.

 

These are all MS120 switches

 

After some research/poking about last week, I've got a bit more information. First and foremost, this seems to surface if the remote network has had any sort of WAN/AutoVPN disruption. From what I've gathered, once a port has gone into crit auth mode, it seems to stay 'stuck' in that state until the 802.1X access policy is removed & re-applied. I suspect a switch reboot would also resolve.

 

Going to do some on-site testing with a test network (MX67 + MS120) on Wednesday. Pretty confident I can reproduce this by simply pulling the WAN connection for the MX and connecting a workstation to a switchport with the 802.1X access policy. Or using an MX FW rule to explicitly block traffic between the MS120 and the NPS/Radius server.

 

Will follow-up once this is done.

Crocker
A model citizen

Ran low on time so didn't get to do as much testing as I wanted; However, I did work with support today and was able to reproduce the issue by blocking comms between the switch and NPS via FW rule. With the comms interrupted, the port went into critical auth state as expected.

 

VLAN 10, the workstation VLAN, worked just fine - full open as expected; However, the support rep on the line mentioned that the phone appeared to come up in VLAN 10 instead of 20 (voice vlan - also, the dashboard showed it on 20). He ran a capture and mentioned he could see the phone attempting DHCP (unsure if it was doing this on VLAN10 or VLAN20) and getting no response. The workaround for this is to remove the 802.1X access policy from the switchport...

 

Additionally, once communications between the switch and NPS server were restored, the port stuck in critical auth mode and did not recover. We verified comms between switch and NPS via the radius test button in the access policy. This appears to be a known issue/is with development for resolution. The workaround for this is to remove the 802.1X access policy from the switchport, and then re-apply it.

 

If any of the Meraki folks that haunt the forums want to take a glance, this is case # 08412961.

cmonk
Comes here often

I had opened a case a few weeks ago regarding critical auth vlan recovery not working like it should.  Several different switch models running 14.33.  The port would appear stuck providing canned eap responses instead of reaching back out to the radius server when client was connected/re-connected.

 

I'll be awaiting a fix for this as well it would seem.

Heef
Here to help

Any update on this?  We're upo to 15.14.2 and still seem to be having intermittent Critical Auth issues.  The switch doesn't seem to be honoring the responses from the NAC server.

Crocker
A model citizen

No, I ended up band-aiding this via an API script that looks for Critical Auth complaints from the switch event logs, identifies the affected ports, removes the Access Policy from those ports, waits 2-3 minutes, then places the Access Policy back in place.

cmonk
Comes here often

Sorry for the late response, I don't typically sign back in here unless I have an issue. 🙂

 

Yeah, Meraki support enabled a couple features within my org that seems to fix Critical Auth recovery across all networks once I enabled.  "Guest port bouncing" and "radius monitoring"

 

cmonk_1-1678217329083.png

 

 

 

 

Crocker
A model citizen

Interesting. I *think* we tried those steps when I worked with support back in August, but it didn't work out. Don't quote me on that, though 🙂

 

I do see that the recent MS 15.20 RC release notes explicitly call out the exact issue we continue to experience with our MS120's:

 

Ms120/125 known issues

  • Ports with an odd-numbered MTU value fail to initialize (predates MS 11)
  • Switches will never move a RADIUS server's connectivity status to available if it was ever lost resulting in all authentications being placed into the critical auth VLAN (present since MS 14.32)
cmonk
Comes here often

Yep, I would hit up support and ask that they "unhide" those two settings above for your org, then you enable per access policy.  That's what fixed us.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels