Layer 3 Switching and the Client VPN

Twitch
A model citizen

Layer 3 Switching and the Client VPN

Hey folks - I have a situation that I could use some help and suggestions with.

 

I transitioned our network from inter-VLAN routing on the MX to inter-VLAN routing via layer 3 switching on the switch stack. I started this process last night at 10:00 pm and had to deal with several frustrating issues, mostly related to OSPF. We finally have it running without OSPF for the moment after many hours with Meraki support trying to figure-out what in the heck was going on.

 

We have one lingering issue that we can't seem to beat: Before moving the VLANs to the stack, the client VPN worked great. Now, not so much. Clients were connecting via RADIUS authentication without issue, and network resources such as file shares and databases were available for the using.

 

Unfortunately, after moving the subnets to the stack, that accessibility has all but vanished. We had to switch to Meraki Authentication from RADIUS because clients could not get authenticated. We updated the Network Policy Server to reflect the changes in the network, but still no joy getting authenticated and connected. Also, access to network shares and databases, etc., has all but disappeared.

 

Has anyone else experienced such a loss of connectivity after implementing layer 3 switching???

 

Thanks!

 

Twitch

27 Replies 27
RomanMD
Building a reputation

Have you checked the RADIUS logs from which IP the requests are coming from? 

If my memory serves me well, when I was testing the Client-VPN the request to Radius were coming with the source IP of the highest vlan id. 

Now, that you moved the VLANs to the switch, the IP from which the requests are going out to RADIUS have also, probably, changed. 

Can you verify that?

PhilipDAth
Kind of a big deal
Kind of a big deal

As @RomanMD says, it will be because the RADIUS requests are coming from a different IP on the MX (you probably deleted/disabled the interface they were coming from).

 

The L3 switches are most likely to have a default route to the MX.  Just make sure nothing is overriding the subnet being used by Client VPN.

cmr
Kind of a big deal
Kind of a big deal

@Twitch are you saying that once a client is connected via the VPN they cannot access networks on VLANs now routed via the switches.

 

Did you re-use the IP addresses of the VLAN interfaces used by the MX on the switches, or did you use a different IP.  If the same is it a single switch, stack or active/standby setup?  Have you entirely removed the interfaces from those VLANs on the MX, or renumbered them?

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

Thanks for the replies, Gents.

 

@RomanMD- I believe you are correct. Meraki support recommended using the transit VLAN interface that resides on the MX, in our case 192.168.20.2, and used to be 10.0.0.2, which is the default gateway for our data network. I added a RADIUS client with 192.168.20.2 to the domain controller via Network Policy Server (based on a doc I received from support), but that didn't change anything - clients are still unable to authenticate via the VPN because a "domain controller cannot be found."

 

We are using Meraki authentication for the VPN as a temporary workaround until we can get RADIUS working again.

 

@PhilipDAth- You are correct - I deleted the VLAN interface that the VPN was using on the MX and moved it to the switch stack (10.0.0.0/22). The only remaining interfaces on the MX are the Management VLAN (192.168.100.0/24) and the Transit VLAN (192.168.20.0/24). All other interfaces now reside on the stack.

 

You are also correct regarding the default route on the stack which points to the MX IP of the transit VLAN, 192.168.20.2.

 

There should not be any subnet conflicts that would be overriding each other. The client VPN is using 10.10.10.0/24, which is unique in our environment. It seems like the traffic is arriving via the client VPN but not being passed correctly to the 10.0.0.0/22 network to reach the DC for authentication. There is a static route on the MX for each subnet that points down to the stack Transit VLAN interface of 192.168.20.3. Internal to the network traffic is flowing like a hot knife through butter. Traffic on the site-to-site VPN is also working fine from all remote locations - everything is accessible.

 

@cmr- You are correct - users connecting via the client VPN are unable to reach the DC for authentication or reach the primary data center network of 10.0.0.0/22, which is also where the DC resides along with the file shares.

 

I re-used every interface - same IP scheme, gateways, and VLAN numbers. The VLANs that were living on the MX now live on the switch stack dot for dot - I figuratively picked them up and stuck them on the stack. In fact, the only new VLANs are the management VLAN and the Transit VLAN, both of which live on the MX and the stack with their respective interfaces on each.

 

Our stack is comprised of three MS250-48 switches, stacked together. We do not yet have an active/standby setup, so all switches are active in the stack. All production VLANs have been removed from the MX as noted earlier.

 

I truly appreciate your thoughts, guys. This is the only fallout of the change. If I can figure this out and get it going I can move on to getting OSPF implemented.

 

Thanks!!

 

Twitch

 

 

 

 

Twitch
A model citizen

Would this have anything to do with the mode our MX is in? Currently the MX is in Routed Mode since all of the VLANs lived there. Does the mode need to be changed to Passthrough or Concentrator Mode now to accommodate layer 3 switching????

 

 

cmr
Kind of a big deal
Kind of a big deal

@Twitch is there a static route on the L3 switch stack for the 10.10.10.0/24 network, pointing down the transit VLAN to the MX?

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

@cmr- No, there is not a static route currently. Does a layer 3 interface need to be added to the stack for the 10.10.10.0/24 network, or just a static route pointing to the stack Transit VLAN interface?

 

 

Twitch
A model citizen

@cmr- I read your message backwards. I didn't see that you said from the stack to the MX. There is not a route there, either. I'll add it. Hang-on.

Twitch
A model citizen

@cmr- No joy. Added the static route pointing to the Transit VLAN interface on the MX and we still have the same problem. We tried with RADIUS and Meraki Authentication, but the problem remains.

 

 

cmr
Kind of a big deal
Kind of a big deal

@Twitch sorry I missed that you already had the static default route going back to the MX already, that will cover it.

 

If you are on a VLAN on the MS stack and do a trace route to a 10.10.10.0/24 IP address does it go as below:

 

Hop1 - IP of VLAN interface on MS

Hop2 - IP of transit VLAN on MX

etc.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

@cmr- Traceroute actually returns no results:

 

Twitch_0-1621279160275.png

Both the 192.168.20.3 and 10.0.0.2 are VLAN interfaces on the switch stack.

 

Could this be a Native VLAN issue? Is Client VPN traffic tagged or untagged?

 

One the Meraki support reps that I spoke with told me to set my Native VLAN to match the Transit VLAN, which is 20. He had me configure this on the Per-port VLAN Settings on the trunk link between the stack and the MX, as well at the stack end of the trunk. The rest of the network is using Native VLAN 1. Could it be that the traffic is being dropped as it's coming back up from the stack? Our data center traffic is using VLAN 1.

 

Twitch_1-1621279523266.png

 

Twitch_2-1621279742859.png

 

 

 

cmr
Kind of a big deal
Kind of a big deal

@Twitch sorry I meant run traceroute from a device on the LAN in one of the networks that now routes via the switches, the trace route in the dashboard doesn't work...  In terms of the transit VLAN we always have that setup as an access port with therefore just the one VLAN on it, can you try that, or do you need more than one VLAN to go over the link?

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

@cmr- No worries! The trace went just like you thought it would:

 

Twitch_0-1621291218345.png

 

192.168.20.2 is the Transit VLAN IP on the MX.

 

I need two VLANs to run over the link between the MX and stack - the Transit VLAN, and the Management VLAN. For that reason, I don't think I can make it an access link.

 

 

Bruce
Kind of a big deal

Out of interest, what VLAN number is your management VLAN?

 

I note you've used VLAN20 as your Transit VLAN, subnet 192.168.20.0/24, and then you've used 192.168.100.0/24 for your management VLAN. If your management VLAN is VLAN100, then your RADIUS requests from the MX will be being sourced from the management network Layer 3 interface.

 

Also, I notice that in your traceroute there was an ICMP drop on the second attempt to contact the Layer 3 gateway for the VLAN. If this is all on the same LAN then I'd find this unusual (not unheard of, but just unusual). Might be worth investigating if this is happening regularly - run a ping against the Layer 3 gateway, check event logs - it might be a sign something untoward is happening on the network. Also, make sure you don't have a Layer 3 interface for the Management VLAN configured on the Layer 3 switch stack.

 

Just some thoughts in passing.

Twitch
A model citizen

Good morning, @Bruce. You raise some interesting points.

 

Management is 100. Meraki support told us that we needed to use the MX IP of the Transit VLAN for the RADIUS config on the domain controller. At the same time, they also said we need a Layer 3 interface for the Management VLAN on both the MX and the switch stack.

 

I will check on the Layer 3 issue that you raised. I do know that pings to the Transit VLAN MX interface - 192.168.20.2 - are solid with no drops.

 

Here are two traceroute results, one to the Transit VLAN MX interface which drops the second attempt, and another to the Data VLAN IP gateway, 10.0.0.2 (it's also the gateway for my computer) with no drops at all. It seems that the drop is only happening when sending a trace to the Transit VLAN MX IP.

 

Twitch_0-1621339266454.png

 

Twitch_1-1621339304914.png

 

 

 

This has been such a mess. I really appreciate everyone's help.

 

Thanks.

Twitch
A model citizen

Here is the current MX config:

 

Twitch_0-1621339632963.png

Twitch_1-1621339673294.png

Twitch_2-1621339719930.png

And here is the Layer 3 on the stack:

Twitch_3-1621339782320.png

 

 

 

cmr
Kind of a big deal
Kind of a big deal

@Twitch as the trace route from your PC to the client VPN device worked (don't worry about the packet loss on the core hop, we had similar with some 355s and the support engineer said that you will get some loss as it is lowest priority and if anything else is going on it will drop it...), can you do a trace route from a client VPN device back to your PC?  It should follow the same path, but may show the other interfaces on each device.  

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

Morning @cmr - here is the trace from a laptop connected to the client VPN (via Meraki Authentication since RADIUS is down) back to my computer -

 

Twitch_0-1621342306833.png

 

cmr
Kind of a big deal
Kind of a big deal

Thanks @Twitch that has the same number of hops and gets there, I think your routing is fine.  It must be the RADIUS source IP or similar as mentioned on the thread.  If you enable logging on the firewall of the RADIUS server, do you see any requests coming in (allowed or dropped) from any of the MX IP addresses?

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

Thanks @cmr - I will get the logging enabled and see what I can find. I'm stuck between comments here and what support told me as far as which interface the RADIUS server is going to use to reach the MX - support said it will use the IP of the Transit VLAN, and @Bruce believes it will use the MX IP of the Management VLAN.

 

What's strange is I can ping the Transit VLAN MX (192.168.20.2) IP, but I cannot ping the Management VLAN MX IP (192.168.100.1) from the RADIUS server. If the RADIUS server does in fact need to use the Management VLAN MX IP, the server can't reach it for some reason. I can, however, ping both ends of the Management VLAN between the MX and the stack. Nor can I ping 192.168.100.1 from my desktop.

 

From the RADIUS server to the MX Management IP:

Twitch_0-1621348124359.png

 

From my computer (in 10.0.0.0/22) to the same destination IP:

Twitch_2-1621348239581.png

 

 

 

 

Twitch
A model citizen

@cmr- With 192.168.100.1 configured in the RADIUS NPS I did not see any requests coming in. With 192.168.20.2 configured, I did see a request come in and a log file was created. I don't see anything in the log, however, that indicates success or failure, but the connection did fail.

 

I did see this however. Not sure if it helps:

 

<Proxy-Policy-Name data_type="1">MX Radius Authentication</Proxy-Policy-Name><Packet-Type data_type="0">3</Packet-Type><Reason-Code data_type="0">16</Reason-Code></Event>

 

 

 

 

Twitch
A model citizen

@cmr @Bruce @PhilipDAth @RomanMD -

 

Gents - I was able to get it working. It appears that the shared secret used between the MX and the RADIUS server had somehow unsynced themselves (for lack of a better term). I reset the secret on the MX and the Client VPN connected via RADIUS like a hot knife through butter once again. It connected via 192.168.20.2, which is the MX IP of the Transit VLAN.

 

All shares and network resources are accessible.

 

Thank you so much for all of the replies and assistance. Please excuse my inexperience with RADIUS and NPS and my lack of understanding of how the Client VPN setup works on the MX, especially the role of the shared secret between the MX and RADIUS server.

 

This community is awesome. Probably the best support community on the interwebs.

 

Now I just have to get OSPF running.

 

Have a great day everyone! I know my day just got a WHOLE lot better.

 

Cheers!!

 

Twitch

cmr
Kind of a big deal
Kind of a big deal

Great news @Twitch - was it that <Reason-Code data_type="0">16</Reason-Code> statement that got you there?

 

For information the reason codes are:

 

Message

Reason code

Description

IASP_SUCCESS 

0

The operation completed successfully. 

IASP_INTERNAL_ERROR 

1

An internal error occurred. Check the system event log for additional information. 

IASP_ACCESS_DENIED 

2

There are no sufficient access rights to process the request. 

IASP_MALFORMED_REQUEST 

3

The Remote Authentication Dial-In User Service (RADIUS) request was not properly  

formatted. 

IASP_GLOBAL_CATALOG_UNAVAILABLE 

4

The Active Directory global catalog cannot be accessed. 

IASP_DOMAIN_UNAVAILABLE 

5

The user account domain cannot be accessed. 

IASP_SERVER_UNAVAILABLE 

6

The server is unavailable. 

IASP_NO_SUCH_DOMAIN 

The specified domain does not exist. 

IASP_NO_SUCH_USER 

8

The specified user account does not exist. 

IASP_EXTENSION_DISCARD 

The request was discarded by a third-party extension DLL file. 

IASP_AUTH_FAILURE 

16

Authentication was not successful because an unknown user name or incorrect  

password was used. 

IASP_CHANGE_PASSWORD_FAILURE 

17 

The user could not change his or her password because the new password did not meet  

the password requirements for this network. 

IASP_UNSUPPORTED_AUTH_TYPE 

18

The specified authentication type is not supported. 

IASP_NO_CLEARTEXT_PASSWORD 

19

The user could not be authenticated using Challenge Handshake Authentication  

Protocol (CHAP). A reversibly encrypted password does not exist for this user  

account. To ensure that reversibly encrypted passwords are enabled, check either  

the domain password policy or the password settings on the user account. 

IASP_LM_NOT_ALLOWED 

20

LAN Manager authentication is not enabled. 

IASP_EXTENSION_REJECT 

21

The request was rejected by a third-party extension DLL file. 

IASP_EAP_NEGOTIATION_FAILED 

22

The client could not be authenticated because the Extensible Authentication  

Protocol (EAP) Type cannot be processed by the server. 

IASP_UNEXPECTED_EAP_ERROR 

23

Unexpected error. Possible error in server or client configuration. 

IASP_LOCAL_USERS_ONLY 

32

The current server configuration supports only local user accounts. 

IASP_PASSWORD_MUST_CHANGE 

33

The user must change his or her password. 

IASP_ACCOUNT_DISABLED 

34

Authentication failed because the user account is not enabled. Before the account  

can be authenticated, a person with administrative rights for either the computer  

or the domain must enable the user account. 

IASP_ACCOUNT_EXPIRED 

35 

The user account has expired. Only a person with administrative rights for either  

the computer or the domain can reset the expiration date on the user account. 

IASP_ACCOUNT_LOCKED_OUT 

36 

The user account is currently locked and cannot be authenticated. Only a person  

with administrative rights for either the computer or the domain can unlock the  

user account.

IASP_INVALID_LOGON_HOURS 

37 

Authentication failed because of a logon time restriction on the user account.  

Ensure that the permitted logon hours for the user account are correct. 

IASP_ACCOUNT_RESTRICTION 

38

Authentication failed because of a user account restriction. Check the user account  

properties for restrictions. 

IASP_NO_POLICY_MATCH 

48

The connection attempt did not match any remote access policy.

IASP_NO_CONNECTION_REQUEST_POLICY_MATCH 

49

The connection attempt did not match any connection request policy.

IASP_DIALIN_LOCKED_OUT 

64

The user account exceeded the remote access account lockout count

IASP_DIALIN_DISABLED 

65

The connection attempt failed because remote access permission for the user account  

was denied. To allow remote access, enable remote access permission for the user  

account, or, if the user account specifies that access is controlled through the  

matching remote access policy, enable remote access permission for that remote  

access policy. 

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Twitch
A model citizen

Hey @cmr - Your suggestion to turn on logging for RADIUS as well as the 16 code got me there. In addition, some other threads I found on the Community helped a great deal as well.

 

This link as well: https://documentation.meraki.com/MX/Client_VPN/Troubleshooting_Client_VPN#Windows_Error_789

 

What a relief.

 

Ironically, I managed to fix it 20 minutes before my ECMS2 class was set to begin. I'm really looking forward to this class filling in the many gaps in my Meraki knowledge, and if it isn't obvious, those gaps are many.

 

Thanks again everyone!!

 

Twitch

RomanMD
Building a reputation

I don't have the full picture of your network but assuming you only have Mx, switch and Client VPN, and everything is behind the switches then: what is the Management VLAN all about? Is it the management for the switches only? Then it does not make any sense to have the vlan100 interface on the stack. It is better just to have that as native vlan on the MX port. 

If it is for some other purposes, then again... what's the purpose?

Twitch
A model citizen

@RomanMDOur network here consists of an MX, a three switch stack, a lot of access switches, and access points. The point of the Management VLAN that I created is provide a network for devices to use to reach the Dashboard separate from the Production VLANs that exist on the stack. There is a caveat with layer 3 switching that the management network cannot have an interface on the switch stack (though there is an exception in SNMP is not running, as @PhilipDAth shared in a reply to a different post recently).

 

Using the Management VLAN allows all of my switches to reach the Dashboard on a network that is separate from the production network. I have it on the MX so my devices will remain connected to the Dashboard even if I royally screw something up with the production network config.

 

The only reason I have an interface for the Management VLAN is because a support rep I spoke with said it needed to be there, and I respected his knowledge on the subject. I do not have enough knowledge/experience with Meraki to be able to argue otherwise. The Meraki world is new to me.

 

Hope this isn't clear as mud. 😉

 

Twitch

RomanMD
Building a reputation

The Meraki world is not different than normal Cisco world. Having that vlan100 on both MX and core switch and acting as management vlan for switches doesn't sound good for me. That's the only thing that I want to point.

 

We have similar setup in our location. 

I have the management vlan on MX which is a native vlan on the MX port and acting as management for switches, then another vlan on the switch core for management of the access points. I need another vlan just because S2S VPN, but in your case another vlan for AP's is not needed.

 



Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels