We're on the verge of moving some connections into production using the vMX in Azure / AWS but the lack of an HA configuration, such as what you can build with the physical hardware, makes me a little nervous. I've seen the scripted AWS failover system put together by @PhilipDAth but I was wondering what you're experiences have been like as far as reliability with a single appliance.
Is the underlying infrastructure generally reliable enough to stick with a single vMX or would you recommend the extra layer of complexity of a second appliance and some sort of user built routing/failover in AWS and Azure (unsure even how yet)?
Another option for you to consider (I've also used this);
In Amazon AWS you have availability zones (AZ). Each AZ is a different physical data centre. Each AZ is also a unique subnet (can also be multiple subnets).
To actually do application redundancy you need to place your redundant servers in different AZs. So for clients to access those redundant services they need a way of being able to change from the primary to backup IP addresses (or they could be used both at the same time if active/active).
So considering this, you can also simply place a VMX in each AZ, and run them active/active. Configure each VMX to only give out the routes for the AZ it is in. You also need to configure an AWS route-table for each subnet so that it sends traffic to your spokes only via its local VMX.
This preserves the full L3 failover design of the applications and keeps the network routing and deployment simple.
I had never really looked at or considered a HA vMX setup. Its a VPN Concentrator when you boil it down. Same reason we dont bother backing up the VM in Azure. It is actually easier and quicker to just redeploy it and let the config come back down!
Now the problem with the above is a lack of automation. It can mean downtime.
Something that would be really awesome is a monitoring method that would trigger a redeployment of the vMX through powershell scripts.
I have a good few vMX in the wild, and i have not had a blip on them. I wish they did more, but then, why would you buy an ASAv or CSR1000 for twice the price...