The problem has been going on for seven months.
We opened a ticket with Meraki support.
Packet captures were taken, and support noted the slowness on the VLANs from the core switches.
The following solution was proposed:
Our analysis of the captured traffic reveals several network-related issues that are likely contributing to the performance problems. Specifically, we observed the following:
TCP Retransmissions: The analysis of the traffic to both destination IPs shows significant TCP retransmissions. Retransmissions occur when a sender doesn't receive an acknowledgment for sent data, indicating packet loss or delays.
This is a primary cause of slowdowns, as data needs to be re-sent, consuming more bandwidth and increasing transfer time. Increased Latency/Time Deltas: We noted noticeable time delays between packets. This increased latency slows down communication, as devices must wait longer for responses, impacting the overall transfer rate. TCP Window Size Fluctuations:
The TCP window sizes vary, which can be a sign of congestion control mechanisms being triggered. While window size changes are normal, erratic or consistently small windows can further contribute to performance bottlenecks.
These observations strongly suggest that network-level issues are impeding the Nutanix update process. To resolve this issue, we recommend the following troubleshooting steps: Investigate Network Congestion:
Uplink Utilization: Monitor the bandwidth usage of the switches' uplink(s) to identify any potential bottlenecks.
Network Path Analysis: Use traceroute or similar tools to map the network path between the Nutanix servers/clients and the update servers to pinpoint potential congestion points.
Check Physical Layer: Ensure all network cables are properly connected and in good working order.
Quality of Service (QoS): Review your QoS configuration on other network devices to ensure that Nutanix update traffic is being prioritized appropriately.
Isolate Network Segments: If possible, try to isolate different network segments to determine if the issue is localized to a specific area. Switch Restart (Troubleshooting Step): As a troubleshooting step, consider performing a controlled restart of the core switches (MS425) during a maintenance window.
This can sometimes resolve transient software or hardware issues that might be contributing to the problem.