iPSK with Radius (ISE) and roaming issue ?

thomasthomsen
Head in the Cloud

iPSK with Radius (ISE) and roaming issue ?

Hi all

 

I have just observed some strange wireless behaviour, and just wanted to know if anyone else is seeing something like this.

 

I have boiled the network down to this (in order to keep it as simple as possible).

2 APs on the same switch (with the same wireless config of course). Both the APs are of course connected to trunk ports on the switch, so the appropriate VLAN for the client is available.

There is also a firewall that can route between VLANs, because we have a wired client on another VLAN continuously pinging the problematic client ( to keep "track" of when this problem occurs ) , and to keep it simple, "permit any any" is in effect on the firewall 🙂

The wireless client in question is of course always connected to the same VLAN.

 

So here is the problem / scenario:

 

Client authenticates to AP1 using iPSK.

ISE says everything is good.

Dashboard says the client is connected.

Client also says its connected.

The client can pass traffic to the network and everything is good.

- So far so good.

 

Client now moves to AP2 - and roams.

ISE says that the authentication is still good

Dashboard also says that the client is connected (now to AP2)

Even the client says its connected.

But no traffic is passed to the network, it just stops 😕

 

Doing a "ping" from the client page in dashboard (that in reality is not ICMP but ARP I think from the AP) gives a response from the client. But the PC on the other network can no longer ping the client.

 

If I then move back to AP1, when the client roams to AP1 - everything works again.

 

I have no idea what happens, and cause this to fail.

Everything says its ok. So I feel that something inside the AP fails.

When I change the client to a network running iPSK without Radius, or just plain PSK the roaming between the two APs work everytime.

------------------------------------------------------------------------------------------------------------------------------

Any suggestions ? - Or do anyone know of a known bug that could cause this kind of problem ?

 

NB: Equipment used : 2 x MR46E - 1 x MS120-8LP and 1 x MX68 and one ISE 🙂 In case anyone wants to replicate it 🙂 - The wireless client is setup with a static IP address (just wanted to mention this in case this could be a problem).

25 REPLIES 25
KarstenI
Kind of a big deal
Kind of a big deal

First thing to check is if the MAC-address of the client moves correctly from the AP1-port to the AP2-port on the switch where both APs are connected.

I dont think we actually see this, because (And I didnt check this part) . But Im fairly certain it does not show up on the switchport - It does show up connected to the AP in dashboard.  - If we reboot the client, and it does a completely new association to AP2, it works. It only seem to be a problem when roaming ... its very strange.

 

I assume you SSIDs are in Bridge-Mode? Then the Client MAC has to show up on the switch port and also "roam" from the original to the new port. This has to be observed directly on the Switch.

 

Another question is what happens in the following scenario:

Client on AP1 -> Roam to AP2 -> roam back to AP1

Does the ping come back?

Yep. Bridge mode. - And yes, I agree, the MAC should show up on the switchport of AP2, but as far as I remember it does not. (It does if I run static PSK , or iPSK without Radius). And since I can "ping" the client on AP2 from dashboard, but not the other client, I think the traffic might be "eaten" inside the AP, or the switch should do something, but every other SSID and with static PSK, there are no problems

 

And yes, if I go back to AP1 in this scenario, the ping comes back.

I think "ping" from the dashboard is done via ARP directly from the AP radio, not ICMP as far as I remember.

I will of course double check the MAC address thing.... just to be sure. But Im fairly certain that it did not. I am currently looking at some wireless packet traces I did. Just to see if there is anything strange "in the air" so to speak. But since both the AP and the client seems to think the connection is ok ... I dont think I will find anything here.

 

ww
Kind of a big deal
Kind of a big deal

Are you on 28 or 27 fw?

Could you try 27 if your on 28 now

thomasthomsen
Head in the Cloud

We are on 28 (and just upgraded the APs yesterday to newest 28.6 just for fun) it did not resolve anything. And since Im on 28, I cant downgrade to 27 (without creating a case).

KarstenI
Kind of a big deal
Kind of a big deal

Just tried to reproduce it:

  • ISE shows successful Authentication
  • MR44 connected to MS220-8P
  • MR36 connected to MS120-8P
  • Both DHCP and static IP for the Client tested
  • Test-Ping is routed over Cisco CBS350 and not MX
  • no problems while roaming

I think it's time for a Case with Meraki Support.

Thanks for trying to replicate my problem. 🙂

"I think it's time for a Case with Meraki Support." - That was what I was afraid of.  🙂

 

GIdenJoe
Kind of a big deal
Kind of a big deal

Hey Karsten, I never had to do a config like with with radius.  Since it's iPSK I assume you also need to not use 802.11r so I deduce that you'll need to see an authentication event at every roam?  Is this correct?

So basically you have to see a new auth on ISE and another 4 way handshake at roam.

KarstenI
Kind of a big deal
Kind of a big deal

I only tested it without 802.11r enabled and I am not sure if it will work or have a significant benefit. The RADIUS communication is only one RTT and doesn’t involve any public key cryptography as in .1X. If the endpoint MACs are in a central LDAP database it could add some latency, perhaps this is worth some more investigation.

So, yes, in this setup, every roam is a MAB event on the ISE and a 4 way handshake follows.

thomasthomsen
Head in the Cloud

Just a follow up on my iPSK issues.

 

Right now clients fail their connections once in a while (they are roaming a lot) - but nothing the radius server should not be able to handle.

When I do packet captures I can see that radius packets are being send and recieved in what I would call a "timely" manner: [Time from request: 0.020714000 seconds] - or there abouts.

 

But ... for some reason I see these errors in the dashboard (not all the time, but once in a while).

 

Client failed 802.1X authentication to the RADIUS server.auth_mode='wpa2-psk' vlan_id='156' radius_proto='ipv4' radius_ip='10.0.8.101' reason='bad_password' roam_ap='E2:55:A8:0C:F4:C2' radio='0' vap='1' channel='6' rssi='45'

 

Bad_password ? - What is that ? is that the PSK that is returned that is somehow bad ?

When looking into the radius server (ISE), when this happens, there is no difference in the return answer.

When looking at packet traces there does not seem to be anything wrong as well.

 

This is very strange.

Hmm I starting to think that the above error could be cause by packet loss to the Radius server.

In iPSK only two packets are used (as far as I can see from packet captures).

So what happens if the AP does not receive the access-accept with the PSK, I would have thought I would have seen a radius timeout error, but perhaps, because of some mixup, I get the above with "bad_password" because the PSK never arrives at the AP ?

Isn't packetloss a bit rare on a wired network?  Unless you have like massive oversubscription on your links.  You should check if your packets have high enough layer 3 marking because those are network control.

KarstenI
Kind of a big deal
Kind of a big deal

I also would not expect the problem to be with loss on the wired side. But RADIUS uses DSCP of CS0 and I haven't seen an option to change that.

The radius server is across a WAN (AutoVPN). So we do se some PL (on the MX Uplink page).

But the main question is, could / would the above client failure output, when using iPSK, be the error message when this happens ? - Because I dont see any other errors in any log.

I just tested it by dropping the RADIUS replies and didn’t get this message. I would expect that the client and AP don’t go for the 4way handshake in this failure-scenario (why should the AP start without having a PSK) but still have to capture that.

In that scenario, do you then see any "error log" for the client, or for the radius request ?

The event log is not that clear. I still have to capture both the successful and failed event on the wireless to completely correlate them.

 

This is success:

Bildschirmfoto 2022-04-02 um 13.51.16.png

 

This is the failure:

Bildschirmfoto 2022-04-02 um 13.57.12.png

 

I have been troubleshooting some more.

I switched the solution to "iPSK without Radius", and no more errors for authentication shows up in the log for the clients. This had me wondering IF the ISE , for god knows what reason, returned something wrong, even though we are hitting the same results and so on. I did some radius packet sniffs, and I can see that the returned Tunnel-Password value (inside the packet) is different. Is this supposed to be like that ? - I would have presumed, that the Tunnel-Password value would have been the same everytime when its the same PSK thats returned. Of course the value is hashed in the packet, so it might be hashed with the AP name/IP or mac of the client or something, and thats why its different for each packet (I did not look at the client mac or what AP was doing the request at the time, just noticed that it was different).

I found out why the PSK looks as it does inside the Radius packet between ISE and the MR. 

thomasthomsen_0-1651046514831.png

 

The PSK is actually encrypted in transit. So ... 🙂

Hi Thomas,

 

I have a similar problem in my network with the iPSK validation when a wirelless phone make roaming.

 

have you finally solved your issue?

 

Thank you in advance.


BR

Well I think the issue disappeared at some point during an AP upgrade.

And / or also when we got more "in control" of the actual clients (aka. getting all MAC addresses into the right groups). 

 

Do you see any log output with errors when you roam ?

Or is this, like I discovered recently, roaming problems with "DHCP required" option enabled (I mean, I clearly did NOT read that documentation correctly on that one the first time 🙂 )

PS: My original problem in this thread did not have anything to do with "DHCP required" - it just didn't work. 

 

PPS: Come to think of it, we MIGHT have tweaked the radius timers a little as well.

Hi Thomas, 

 

Thank you so much for your answer, i' getting the following errors in the Timeline:

 

Captura de pantalla por Lightshot (prnt.sc)

 

The VLAN assigned by policy to the device is the VLAN voice and it has DHCP enabled.

 

maybe my problem is related to RADIUS timer too?

 

I'm receiving diferents types of validation on traces on my ISE, the first one is received when the device validate with the network, and the validation is OK:

 

https://prnt.sc/Vm9yEZw2iCs6

 

But when the device make roaming, the validation with the ISE finish with a RADIUS Accounting-Request dropped:

 

https://prnt.sc/prAs1aw-nA7I

 

I'm not sure if my implementation of iPSK with ISE needs to have be fixed to catch the roaming validations (i'm afraid if when a device make roaming is necesary to validate again, but with a different process)

 

 

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels