where is the connection dropped on the network? (how to figure out)

Solved
cabricharme
Here to help

where is the connection dropped on the network? (how to figure out)

Hi all,

 

New to Cisco and Meraki, started a sysadmin job a bit over a month ago that is heavy on networking - not something I've been exposed to before. Have a peculiar problem that we can't yet figure out - hope you can point me in the right direction.

 

We have a VMware ESXi server on the network that most of the other computers on the network cannot access where the connection is immediately dropped with "The connection was reset", "ERR_CONNECTION_RESET" to port 443 on the target. (Pinging and SSH-ing into it - works fine. It's just port 443.) Dozens of other ESXi hosts on the network with identical network configuration - have no issues. Just this one.

 

Question: on a network where all switchgear is Meraki (MS225-48FP, MS250-48FP, MX100-HW, MS220-8-HW, MX64, etc.), how do we figure out where that connection is dropped, i.e. which device and which policy?

 

Some context:

  • Only port 443 (i.e. https://<IP address>) appears to be affected. ICMP, SSH (port 22) - work fine from the same machine that cannot access port 443 on the target.
  • Some hosts on the network - can access port 443 on the target, and some (most) - can't.
  • Changing the target IP address (in case there's a policy selectively blocking that IP from various parts of the network) didn't help.
  • Changing the ports on the switch to which the target is wired (in case the ports are misconfigured or have a blocking policy we somehow can't see) - didn't help.
  • Resetting the network configuration on the ESXi (via "reset to factory defaults" and then setting the static IP, gateway, etc. to what they were before) - didn't help, same exact behavior.
  • My usual MO in a case like this would be put the host in a known good subnet and test there. I haven't yet tried it as this requires some help from my fellow IT people - yet my hunch is that the box itself is fine and the issue is outside of it - likely in the switch or other network configuration.
  • We have dozens of identical ESXi hosts on identical hardware connected to identical switches with identical policies - and they all have no connectivity issues. There is even an identical ESXi host physically right next to the problematic one, on the same subnet, wired to the same switch, with an identical DNS and network configuration (with the exception of the actual IP address, of course) - no issues.
  • The problem is quite critical: our VCSA (vCenter appliance) can't connect to the ESXi host over port 443, and thus it's effectively offline.
  • Per my IT team, the problem has existed for a while. They had other things to work on, and I was basically handed this issue knowing very little about the network configuration, what can cause such an issue, and how to troubleshoot it. I.e. "here is an issue; go fix it".

Tools available to me:

  • Meraki dashboard with read-only access to the entire network configuration including individual ports.
  • I can ask my network admins to make changes (but can't make them myself).
  • Physical access to the rack where the hardware is located.

Questions:

  • Where in Meraki dashboard could I see events, log entries indicating a connection to port 443 on the target was dropped (if one of the Meraki devices or policies is responsible for it)? If this is the right question to ask, would you be willing to help me craft the right search criteria to ID those events?
  • What are the usual best practices in troubleshooting this type of an issue?
  • What are the usual tools used to pinpoint which device or policy is responsible for dropping a network connection to a known good target? E.g. I am trying to access port 443 on a specific IP from my desktop, there are 4 switches in-between, the connection is not working (dropped). If I connect directly to the target (with a crossover cable or something) - it's fine. How do I figure out where on the network the connection is dropped, i.e. what is responsible for dropping it? (trace route, ICMP, SSH indicate no issues in the network connection)
  • If none of the above questions is the right one to ask, what would be the right question(s)? 🙂

Thank you!

1 Accepted Solution
alemabrahao
Kind of a big deal
Kind of a big deal

On MS you can check here: https://documentation.meraki.com/MS/Layer_3_Switching/Configuring_ACLs

 

On MX you can check here: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Using_Layer_3_Firewal...

 

And about event logs you can check here: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Meraki_Event_Log

 

You can use a Packet capture tool for tshoot: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Packet_Capture_Overvi...

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

View solution in original post

7 Replies 7
alemabrahao
Kind of a big deal
Kind of a big deal

On MS you can check here: https://documentation.meraki.com/MS/Layer_3_Switching/Configuring_ACLs

 

On MX you can check here: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Using_Layer_3_Firewal...

 

And about event logs you can check here: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Meraki_Event_Log

 

You can use a Packet capture tool for tshoot: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Packet_Capture_Overvi...

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

Thanks. I could really use some hand-holding here... (Asking specific questions and hoping for specific answers as opposed to RTFM links.)

 

Re: event logs. Does this sound right?

  • no free form text search, e.g. "give me all the events for this source or target IP"? (It needs to be IPs and not MACs because the blocking has been confirmed to be IP-based.)
    • ("download as" only exports the current page, with no way to export all events for further analytics or free form text search?)
  • In the case where a connection is dropped from some IPs but not others involving multiple switches, how would one search for relevant events? (Please be as specific as possible. Please no RTFM links.)
    • (Say, my desktop's IP is A.B.C.1 and the connection to the target IP:port is dropped. If I change it to A.B.C.2, the connection is not dropped. What is the likeliest culprit in the Meraki universe and how would I search for relevant events, specifically?)

Thanks!

We need more information, like a topology, who is the default gateway, etc.

 

 

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

My suggestion for you is to open a support case, Meraki team support will assist you better.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

What I was looking for was something like this:

  1. Go to the Meraki dashboard for the site where the target host is located. (Not needed for single site configurations, required for multi-site ones.)
  2. Go to "Network-wide" - "Event Log", choose "for security appliances" at the top:
    cabricharme_0-1681931470784.png
  3. In "client" field, put a MAC address belonging to the host at issue.
  4. Hit "search", see what comes up.

In our case, this pointed to the root cause: misconfigured "content filtering" rules (or their misbehavior - I still can't wrap my mind around why they are behaving the way they are).

 

Bottom line, for people who aren't too deep into networking or not too familiar with Cisco and Meraki, a little hand-holding goes a long way while RTFM links rarely work. Hope this helps someone else in a similar situation.

GIdenJoe
Kind of a big deal
Kind of a big deal

If your traffic does not pass an MX you won't be able to see if traffic was dropped.

If it does pass an MX you will have to use syslog and point your MX to that for it's flow logs and log all your deny rules to see what deny rule it matched.

cabricharme
Here to help

When searching event logs for "security appliances" and filtering them by the target MAC addresses (thanks @alemabrahao), we see events such as:

Apr 19 11:16:14	<MAC address>	Content filtering blocked URL	url https://localhost.localdomain/..., server <IP address>:54664, category User-defined Blacklist

... which somewhat explains the blocking. (Why "somewhat": other ESXis don't seem to be blocked despite having the same configuration including the "localhost.localdomain" part.)

 

Per our network admin, group policies have been set up a long time ago by a vendor and likely need to be revisited and reconfigured.

 

Adding needed IPs to the "allow list" clearing the blocking - so looks like we're good for now.

 

Thanks!

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels