I wonder if anyone else has run into this problem before.
I have a client using a vMX in Azure and they are reporting 4% packet loss to all the VMs in Azure from the spokes. It doesn't matter which VM is being pinged, and it doesn't matter which spoke the ping is being done from. Spokes can ping other spokes with zero packet loss.
The issue happens consistently 24x7. You do about 100 pings and you will loose about 4 of the packets. The packet loss tends to be spread out.
If you do a ping within the Meraki dashboard there is zero packet loss between the vMX and any spoke that I test. There are no interesting events in the event viewer, and the status bar for the vMX and spokes is solid green.
The VMs in Azure can ping the vMX with zero packet loss. The issue only happens when traffic is routed through the vMX to a remote spoke (or from a remote spoke being routed through the vMX to the Azure VMs).
The vMX and all MX's are running 14.39. I have tried rebooting the vMX and a small selection of MX's. I have also had a small number of the Windows VMs rebooted. The VMs are running different versions of Windows Server as well (so there is nothing in common).
Nothing has made a difference.
Any one had issues with small amounts of packet loss to VMs in Azure connected via a vMX?
Not sure if you have resolved this yet, but out of curiosity, do you see the same pattern of packet loss with other type of traffic? such as telnet or TCP
Do the spokes have multiple uplinks? if so, create a SD-WAN policy to push traffic to vMX via a specific uplink. Or test from a spoke that has a single uplink and see if there's any difference.
No, it is not resolved. I just did a packet capture on the VMX and it does not appear to be affecting TCP traffic. Only ICMP ping traffic.
All spokes have dual uplinks. The spokes are in more than one country and using more than one ISP.
We do have an SD-WAN policy applied.
To me it appears to be an issue with VMX running in Azure. If I promote some spokes to hubs, there is no loss between them. That is the only common point.
>If you have a support case opened already for this one,
I did have a support case open. We can ping between the public IP address of the VMX and the public IP address on the spokes with no packet loss. We can ping between VMX to the spokes over AutoVPN with no packet loss.
After that the support person said it is not a Meraki issue and closed the case.
I'll probably let it rest now. I was just wondering if others were having an issue.
I am seeing the same results in our environment. Just to add a bit more doing testing off our network when I ping the WAN IP of the vMX in Azure I see the same packet loss. Usually when I am testing a continuous ping from my computer it will range from 3 - 9%.
I had a case open with Meraki but I was told its just a VM and it looks setup correctly open a case with MS. MS initial support wants to get logs from the Meraki as they dont see a problem with their side. Me, I just want it to work so we can start moving production servers into the environment.
I have an instance in US and Australia - Azure that are experiencing the exact same behavior.
To add a bit more here when im on the vMX and ping my other MX devices no packet loss, when i ping external addresses i have no packet loss. Its only inbound to the vMX where I see the packet loss.
Would love some help if others have seen this and resolved. I have tried latest 14.x and 15.x code to see what i can do. I did notice in Azure market there is a v2 and v3 of the meraki device but cant really find any documents on the key difference or if I should choose one vs the other.
When I have ran into this problem (not often) the only way I have found to fix it is to delete and re-deploy the VMX. When it works it is reliable. When it is broken it stays broken.
Meraki and Azure support are no help in this area.
I have done lots of packet captures (using Meraki's own packet capture tool) for Meraki support showing packets coming in but not leaving the vMX.
On a seperate note, I have also had issues with running StrongSwan on Ubuntu in Amazon AWS on t2.* instances when using AES128 encryption. Same issue - it either works or their is a constant low level of packet loss. In Amazon if I stop/start the instance (so it goes onto different physical hardware) it works again.
I spent quite a bit of time with Amazon support on this issue, and it looks like some of the Intel chips have a microcode issue (lets call it a microcode bug). It might be related to all the issues Intel ran into with the security vulnerabilities, and solving one thing and breaking another.
With StrongSwan if I run on a different instance size everything works. If I use something other than AES128 it also works.
AutoVPN also uses AES128.
So when working with MS support and using the PSPing tool when i do the ping test to the WAN side of the Meraki using TCP instead of ICMP normal ping I dont have packet loss. Reading through this board it would appear you had the same, is that true? Also with just having ICMP drops did you find that your applications behind the meraki had issues or should I just not worry about it?
I will go and try to do the redeploy again but I know I have done that once or twice while trying to fix routing issue.
I noticed when deploying in MS portal there was an option for v2 and v3 we chose v3 but curious if you know anything the two different versions and which one is best to choose in Azure. I cant find much detail on the difference between the two.
I experienced traffic loss of all traffic types. The remote sites connecting to this VMX had dual Internet links so you could look at the SD-WAN graphs in the Meraki Dashboard - and those graphs also showed the same packet loss.
I've only used the v3 CPU option. It's cheaper and faster.
Did you ever do anything to get this working without packet loss. We are trying to move work loads into Azure, however with the packet loss it scares us. We have moved a couple servers out there, the people that RDP to the server experience drops and reconnects.
I have only come across two scenarios. Either it loses packets or it does not.
If you have the case where it has packet loss I have only found way way to fix it. Delete the VMX and redeploy. This usually fixes it permanently.
I know this an an old thread, but looking around this is exactly my issue and its July 2021 and still don't see any definitive answers. I'd hate to have to tear down this vMX and set back up, but before I do, just seeing if anyone here in 2021 had and/or having this issue and if any resolve?
I have two vMX's (Mediums) for two different regions running about five to ten VM's in Azure connecting as hubs since requirement to have multi-site connectivity for RD's and such. Anyway, on vMX in UK works fine and no issues, the other one here in US on east coast has same issue with dropped packets as described here.
I know it's the vMX, if I PSPing to multiple IP's both Public and Private I lose the packets at the vMX IP and all servers behind that which show same time packet lose as I worked my way back out from the vnet subnets yet no drops on the MS Public IP.
We are having that same issue in US East but not North Central. @CMTech1 , I redeployed but no luck. FYI don't upgrade the public IP sku, I broke our env by doing that. I have packet loss within the same vnet from a prod server to the meraki internal interface (different subnet) and also over the meraki via thrird party vpn. I saw a thread on reggid that mentioned removing the route table from the meraki subent fixed the packet loss. I'm going to try it and see what happens off hours.
@Agio-Networks; As of today we're still seeing the random drops. We see the issue to the three locations we have, US East 2, US West and UK South so if I had to point a finger, though really don't want to, I'd say it's a Meraki issue. Anyway, we have an open/ongoing case with Meraki and the most recent change they made (we don't have access to do) was change the default MTU from 1500 to 1420 to allow more room for the Auto VPN header is what they stated. However, at this time after another week of monitoring it seems we are still dropping random packets so that didn't resolve the issue and have to reply back with the bad news. If I ever do get a resolve, I'll be sure to post it here. Just surprised others haven't seen this in their network monitors.
I had the same issue: consistent packet loss (normally 1% - 2%, but sometimes spiking to 35%+) on both of my Azure VMXs in 2 different regions. When I placed a high load on the VMX's to/from Azure, the AutoVPN would drop. I tried everything noted above to no avail. I finally called Meraki Support today and spoke to Sanket who did a great job working this issue with me. He found a recent internal support article related to this issue. The VMX setup guide failed to mention this configuration item and has since been updated as of October 2021.
To fix the issue, login to your Azure portal and open your Route Table(s). Select Subnets and disassociate the subnet that both the Azure Router and VMX Router share. In my case, it was subnet 10.100.0.0/24. Azure default gateway was 10.100.0.1 and the VMX was 10.100.0.4. I'm assuming the two were fighting over the traffic and causing the packet loss.
After disassociating the subnet, the packet loss stopped, performance noticeably improved, and the AutoVPN drops stopped. I didn't experience any routing or connectivity issues. No downtime was incurred either when I applied the disassociation in Azure.
I hope this helps anyone who's been troubleshooting this for months like me.