How can I get Radius server fail-over to work?

Solved
joincidence
Getting noticed

How can I get Radius server fail-over to work?

I have an SSID set up with Radius and Microsoft NPS servers with wireless AP's as Radius clients. The two radius servers work successfully when individually testing with a phone. I use AD groups to set Meraki group policies. Works great.  

 

Now, I was testing fail-over and found that when I stopped the NPS service on the Radius server in slot #1, my device could not successfully connect to the SSID(with Radius server in slot #2 being the only working one). If I swap the slots, my device can connect. 

 

I read this:

 

https://documentation.meraki.com/MR/Access_Control/MR_Meraki_RADIUS_2.0

 

But it has settings not on my Wireless -> Configure -> Access Control(server timeout, retry count). 

 

Any ideas?

1 Accepted Solution

Alright, so it is indeed the lower firmware version keeping me from seeing the advanced option of failover in non-splash radius. The last two now confirm it, though the first did not. He was "Non-splash radius won't do failover period".

 

Having the option to specify three radius servers and alter their order despite these 2 extra slots being useless also contributed to the confusion. Ideally there'd be one slot there, but it is what it is. I'll figure out the failover/load-balancing locally. 

View solution in original post

21 Replies 21
Ryan_Miles
Meraki Employee
Meraki Employee

You need to expand the section called Advanced RADIUS settings 

I did. There's no documentation that says any options related to failover are found or configured in that section, so I wouldn't know what to change there. Which is why I'm asking here. 

 

 

When adding both servers, you lose communication with the first server and then try to authenticate on the next server.

 

The second server is probably missing some configuration.

 

Have you checked the server logs to validate whether the authentication attempt is occurring?

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

The fallback behavior depends on the order the servers are listed on the dashboard will dictate the priority of each one, For example:

 

Server 1 = priority 1

Server 2 = priority 2

Server 3 = priority 3

Where the available server with higher priority will be used (priority 1 is the highest). If Server 1 were to become unreachable, Server 2 would become active, and so on.

 

If the fallback option is enabled, once the server with higher priority recovers, the AP will switch back to using that preferred (higher priority) server.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

It's pretty straight-forward that's how it should work. I read the documentation. It just doesn't work that way which is why I'm here. 

 

Like I said, I have two working radius servers. If I put either in the #1 slot, I can connect and authenticate just fine. If I stop the service on #1(to simulate an outage), #2 does NOT take over regardless of which radius server it is. The connection on the client device fails(can't connect to SSID)

 

"the fallback option"

This option is NOT in my options on the Access Control page. It's in the documentation, but I don't see it anywhere including advanced Radius options. 

Have you checked if any requests are arriving in the server logs? Did you do a packet capture?

 

The problem appears to be on the server.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

https://documentation.meraki.com/MR/Access_Control/MR_Meraki_RADIUS_2.0

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

Ok, so, I contacted Meraki about it. (a previous attempt with them failed to ascertain the issue) The problem is not on our radius servers as confirmed by them. The problem is if you're using Radius and not using the splash page option, the 3-server order in options is irrelevant. Radius failover functionality in this configuration simply doesn't work. I'm putting in an official request for them to add failover support. 

 

The whole reason I'm exploring the radius server route is to escape the splash page, because the splash page won't always trigger for apple devices, which comprise the majority of our clients. That was the subject of a case with Meraki with many packet captures. The conclusion there by Meraki is that apple doesn't want to share certain information that would aid Cisco in creating a splash page that always triggers for these devices. 

The Advanced Settings section requires all APs support and be running MR28+ firmware.

 

https://documentation.meraki.com/MR/Access_Control/MR_Meraki_RADIUS_2.0#Advanced_Settings

 

To view the max runnable firmware for AP models check this page https://documentation.meraki.com/General_Administration/Firmware_Upgrades/Product_Firmware_Version_R...

 

I found your Support case and see the network linked has older APs some which don't support anything newer than MR26 firmware.

Interesting, but the Meraki support person I just talked to on the phone said it was the fact that I wasn't running Radius for splash page--that failover only worked in that radius configuration. Not sure who to believe now. 

RADIUS for splash is something different. It's when you want a splash page and want it to auth users via RADIUS.

 

Standard RADIUS doesn't use splash. The order of the RADIUS servers is tried from the top down. The advanced fallback feature allows for the APs to rollback to the higher priority servers when they're reachable. Even without that feature the list is tried top down.

 

To validate this as a test I configured bogus RADIUS IPs for my #1 and # 2 servers in the list. My clients still auth successfully because the AP tries the first two and fail as expected and it succeeds on server #3. I also tested in an alternate way by having valid RADIUS IPs, but blocking the first and second via a firewall rule.

 

If the list of multiple servers didn't function a list would be pointless to have in the first place. My recommendation is circle back with Support as it sounds like maybe there's a misunderstanding here.

 

Here's a snippet from the firewall log showing what occurs when my clients auth. They first try the bogus RADIUS server I have listed as #1 (1.1.1.1). That of course fails and it moves on to the #2 RADIUS server IP which is valid (JumpCloud) and my clients successfully connect. This not relying on RADIUS fallback. This is simply the default behavior of trying alternate RADIUS servers when the first one fails.

 

Screenshot 2024-02-09 at 11.18.16 AM.png

I've started a case with three different people in Meraki support, and every time we were unable to make it work. The most recent call, they said it's because Radius for splash supports failover and standard radius does not. He did a packet capture and found that the packets were not even hitting radius server in slot #2, but still hitting #1 only, even though #1 was down.

 

"If the list of multiple servers didn't function a list would be pointless to have in the first place"

 

Oh, I wholeheartedly agree. And yet it's there in the non-splash radius setup. 

PCAP from my AP. It shows it first tries the bogus RADIUS server IP I have configured 1.1.1.1. When that fails it rolls to the second server which succeeds.

 

I would reengage with Support to see why it's not working in your environment.

 

Screenshot 2024-02-09 at 11.47.05 AM.png

Support told me why it's not working in my environment--that this will not work with non-splash radius--that in this configuration it doesn't matter what's in slot #2 if slot #1 is down. It just keeps sending packets to the first one and doesn't try the second. I had one of them tell me to submit a feature request. 

You were told wrong.

I'll start one more case, mention that you think it's possible, and I'll report back my findings. 

Alright, so it is indeed the lower firmware version keeping me from seeing the advanced option of failover in non-splash radius. The last two now confirm it, though the first did not. He was "Non-splash radius won't do failover period".

 

Having the option to specify three radius servers and alter their order despite these 2 extra slots being useless also contributed to the confusion. Ideally there'd be one slot there, but it is what it is. I'll figure out the failover/load-balancing locally. 

The second and third RADIUS IPs do work. My PCAP and other info in this thread validate that fact.

None of the 4 other engineers I contacted in support said it could work unless I either do non-splash radius or I get the failover option by getting new firmware. I'm taking their words over yours, because they have actually done packet captures on my network, tested things with me, and you have not. 

RaphaelL
Kind of a big deal
Kind of a big deal

Out of curiosity , what kind of AP are you using and what firmware ?

 

I will try to test that next week

It's actually just an MR32. We have a lot of MR32's with locked firmware on this network(and we don't have the budget to upgrade at the moment), so we're not going to be seeing those newer options for the time being. I think I'm just going to set up a radius proxy server, have Meraki point to that, and do the failover/load-balancing functionality on our side. 

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels