Vcsa issues since switch “upgrade”

Solved
squidgy
Here to help

Vcsa issues since switch “upgrade”

Hello,

 

We replaced our catalyst switches and mirrored the configuration to some new MS210-24 stacked switches however, we're having problems with the vcsa communicating with the esxi hosts on the management network

 

we have checked everything settings wise, from the meraki side everything looks normal, we can even see vcsa and one of the esx hosts on one of the ports they're using to communicate on - i can ping the ip addresses of both hosts from vcsa cli, (but not their hostnames) 

 

anyone got ANY ideas?  at this point i'm wondering if it's a problem with vmware 6.7

 

I realise i'm short on info, but we've been at this for about 10 hours so if you have a specific question please ask.

1 Accepted Solution
cmr
Kind of a big deal
Kind of a big deal

@squidgy have you set any firewall rules on the 210s?  We run vcsa 6.7 and have hosts all over the country connected via Meraki MXs and in two sites we have MS stacks.  One site has 355s and the other 210s just like you 

 

MS firmware is latest 14.29 on 210s but has worked on multiple 14.x and some older 12.x firmwares from before 14.x existed.

 

How are your hosts connected to the switches?

Where is the vcsa in relation to the hosts?

Is it a flat network or routed VLANs?

What Cisco setup was replaced?

If my answer solves your problem please click Accept as Solution so others can benefit from it.

View solution in original post

5 Replies 5
cmr
Kind of a big deal
Kind of a big deal

@squidgy have you set any firewall rules on the 210s?  We run vcsa 6.7 and have hosts all over the country connected via Meraki MXs and in two sites we have MS stacks.  One site has 355s and the other 210s just like you 

 

MS firmware is latest 14.29 on 210s but has worked on multiple 14.x and some older 12.x firmwares from before 14.x existed.

 

How are your hosts connected to the switches?

Where is the vcsa in relation to the hosts?

Is it a flat network or routed VLANs?

What Cisco setup was replaced?

If my answer solves your problem please click Accept as Solution so others can benefit from it.
squidgy
Here to help

Hi @cmr 

 

I’ll try to answer your questions best I can. 

the hosts are connected two the stacked 210s 2 into to each for redundancy. The iscsi for the guests are also connected the same way.

 

management network is vlan9

server (vms) are vlan2

 

iscsi is vlan 8 if memory serves.

 

the Cisco setup replaced was catalyst 29xx switches, we took a copy of the config and basically copied it. 

when we look at nic0 and the vswitch, we can see both the vcsa and one of the esx hosts, but the vcsa just will not play ball, at all. You can only login using the local admin account on the webgui and it’s very slow to login, it’s timing out basically, nothing is shown in vcsa now, not even the disconnected hosts.

 

we’ve been over the network setup with a fine tooth comb (me and 2 network guys) and we cannot see where the problem lies. The client and server vms seem fine. no issues with the servers or client networks.

Suspecting it’s a bit of weirdness with vcsa I started diving into the vpxa logs, 

 

2021-09-19T12:25:27Z lwsmd: [lsass] Transitioning domain 'Xxxx.local' to ONLINE state
2021-09-19T12:25:47Z lwsmd: [netlogon] DNS lookup for '_ldap._tcp.dc._msdc xxxx.local' failed with errno 110 (Connection timed out), h_errno = 2 (Host name lookup failure)
2021-09-19T12:25:47Z lwsmd: [lsass] Could not transition domain 'Xxxxx.local' to ONLINE state. Error 9502


and 

2021-09-19T03:49:16.942Z cpu2:2107133)WARNING: E1000: 4325: End of the next rx packet


lots of entries for both

 

 

 

 

Bruce
Kind of a big deal

The messages you are seeing are a timeout request for the lookup of the DNS service record for LDAP. My guess is nothing can authenticate properly, hence the login timing out and you can only authenticate with local accounts.

 

You need to troubleshoot where your DNS server is and why you can’t reach it.

 

As an aside, .local is no longer recommended for internal domains as it’s an IETF reserved domain generally used for mDNS and link-local networking. It’s possible this is having and impact, but I doubt it (unless you’ve introduced other systems to the network at the same time as changing the switches over).

squidgy
Here to help

Thanks for the reply -  very interesting regarding .local i'll add it to the list of changes.

 

You were correct also, we had to figure out why the lookup request for dns was timing out..;

 

and we did, about an hour ago. 

 

There was a firewall rule blocking tcp and icmp on those vlans. serves no purpose, it was configured to stop those vlans from accessing the internet - it had the unexpected affect of blocking traffic internally.

 

as cmr asked about the firewall, i marked his as the solution, but i appreciate your response, thank you.

squidgy
Here to help

Could you please tell me where I can check the firewall rules on the 210s? (My colleague is the network guy really) 

 

its worth noting we have mx64s for the sd-wan and firewall, which are connected to stacked ms420s and all the switches are connected to that. We’ve discounted anything else than the ms210s as it’s only the VMware management network having any issues. The catalyst we replaced as I said we copied everything useful (ip addresses vlan setups etc) to minimise the number of changes we were making for troubleshooting purposes.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels