I wonder if anyone else has run into this problem before.
I have a client using a vMX in Azure and they are reporting 4% packet loss to all the VMs in Azure from the spokes. It doesn't matter which VM is being pinged, and it doesn't matter which spoke the ping is being done from. Spokes can ping other spokes with zero packet loss.
The issue happens consistently 24x7. You do about 100 pings and you will loose about 4 of the packets. The packet loss tends to be spread out.
If you do a ping within the Meraki dashboard there is zero packet loss between the vMX and any spoke that I test. There are no interesting events in the event viewer, and the status bar for the vMX and spokes is solid green.
The VMs in Azure can ping the vMX with zero packet loss. The issue only happens when traffic is routed through the vMX to a remote spoke (or from a remote spoke being routed through the vMX to the Azure VMs).
The vMX and all MX's are running 14.39. I have tried rebooting the vMX and a small selection of MX's. I have also had a small number of the Windows VMs rebooted. The VMs are running different versions of Windows Server as well (so there is nothing in common).
Nothing has made a difference.
Any one had issues with small amounts of packet loss to VMs in Azure connected via a vMX?
Not sure if you have resolved this yet, but out of curiosity, do you see the same pattern of packet loss with other type of traffic? such as telnet or TCP
Do the spokes have multiple uplinks? if so, create a SD-WAN policy to push traffic to vMX via a specific uplink. Or test from a spoke that has a single uplink and see if there's any difference.
No, it is not resolved. I just did a packet capture on the VMX and it does not appear to be affecting TCP traffic. Only ICMP ping traffic.
All spokes have dual uplinks. The spokes are in more than one country and using more than one ISP.
We do have an SD-WAN policy applied.
To me it appears to be an issue with VMX running in Azure. If I promote some spokes to hubs, there is no loss between them. That is the only common point.
>If you have a support case opened already for this one,
I did have a support case open. We can ping between the public IP address of the VMX and the public IP address on the spokes with no packet loss. We can ping between VMX to the spokes over AutoVPN with no packet loss.
After that the support person said it is not a Meraki issue and closed the case.
I'll probably let it rest now. I was just wondering if others were having an issue.
I am seeing the same results in our environment. Just to add a bit more doing testing off our network when I ping the WAN IP of the vMX in Azure I see the same packet loss. Usually when I am testing a continuous ping from my computer it will range from 3 - 9%.
I had a case open with Meraki but I was told its just a VM and it looks setup correctly open a case with MS. MS initial support wants to get logs from the Meraki as they dont see a problem with their side. Me, I just want it to work so we can start moving production servers into the environment.
I have an instance in US and Australia - Azure that are experiencing the exact same behavior.
To add a bit more here when im on the vMX and ping my other MX devices no packet loss, when i ping external addresses i have no packet loss. Its only inbound to the vMX where I see the packet loss.
Would love some help if others have seen this and resolved. I have tried latest 14.x and 15.x code to see what i can do. I did notice in Azure market there is a v2 and v3 of the meraki device but cant really find any documents on the key difference or if I should choose one vs the other.
When I have ran into this problem (not often) the only way I have found to fix it is to delete and re-deploy the VMX. When it works it is reliable. When it is broken it stays broken.
Meraki and Azure support are no help in this area.
I have done lots of packet captures (using Meraki's own packet capture tool) for Meraki support showing packets coming in but not leaving the vMX.
On a seperate note, I have also had issues with running StrongSwan on Ubuntu in Amazon AWS on t2.* instances when using AES128 encryption. Same issue - it either works or their is a constant low level of packet loss. In Amazon if I stop/start the instance (so it goes onto different physical hardware) it works again.
I spent quite a bit of time with Amazon support on this issue, and it looks like some of the Intel chips have a microcode issue (lets call it a microcode bug). It might be related to all the issues Intel ran into with the security vulnerabilities, and solving one thing and breaking another.
With StrongSwan if I run on a different instance size everything works. If I use something other than AES128 it also works.
AutoVPN also uses AES128.
So when working with MS support and using the PSPing tool when i do the ping test to the WAN side of the Meraki using TCP instead of ICMP normal ping I dont have packet loss. Reading through this board it would appear you had the same, is that true? Also with just having ICMP drops did you find that your applications behind the meraki had issues or should I just not worry about it?
I will go and try to do the redeploy again but I know I have done that once or twice while trying to fix routing issue.
I noticed when deploying in MS portal there was an option for v2 and v3 we chose v3 but curious if you know anything the two different versions and which one is best to choose in Azure. I cant find much detail on the difference between the two.
I experienced traffic loss of all traffic types. The remote sites connecting to this VMX had dual Internet links so you could look at the SD-WAN graphs in the Meraki Dashboard - and those graphs also showed the same packet loss.
I've only used the v3 CPU option. It's cheaper and faster.