The closest I think you will be able to do is to put an vMX in each availability zone, and have it serve just that availability zone. Then setup your services so they are spread across those two zones, and can failover between them.
I've found these AWS papers about VPN monitor and seems like a viable solution to deploy vMX100 in HA on AWS:
I have a customer that wants to deploy vMX100 appliances in HA on AWS but basically they want to have redundancy un case of a failure in a AZ that causes the termination of the instance.
Based on that this is what I'm thinking ... use the CloudWatch service to monitor the VMX100 EC2 instance and create a rule so that in the case of shutdown or termination of the primary vMX100 instance to trigger a function in Lambda to update the VPC: AWS route table to a secondary vMX100 in a different AZ.
What do you guys think, maybe too complicated ?
I have now managed to come up with a solution for this.
The short answer - I did this using an Amazon AWS Lambda script to check the status of each vMX and dynamically update the Amazon AWS route tables. So if one vMX goes down everything will fail over nicely to the other vMX.
I have had my first crack at documenting the process and requirements here:
Very cool @PhilipDAth
We've done some of this with CSR1000V - and that's pretty much standardised approach on that platform now.
Curious to see if you've been running your solution in production - and how you have found it?
I feel like it's really not worth putting any production stuff into AWS unless it runs on two AZ's.
And I don't want to have to press any buttons manually of course.
We have a customer where we deployed two vMX100 on two different AZ. One of the vMX100 is shutdown, the other one is up and running and basically, CloudWatch executes a lambda script every one minute to monitor the estate of the vMX100 instances and to figure out if it needs to switch over because one of them failed for example. The customer is running this for like a year now and it is working OK.
The only thing that I would mention is that the switchover time is not that fast, because first, the secondary EC2 instance has to change its state to up, and second, the vMX100 takes some time to register to Meraki Cloud to establish the VPN connections and learn the routes. I think it takes like 1 or 2 minutes, but I don't remember the exact amount of time, so take this into account.
I hope this can help you.
I have a customer AWS VPC hosting a resources needs to access by spoke site which is going to have Meraki MX devices.
To provide regional HA, if I create two VPC each is different region under my AWS account and connect those VPC to customer VPC either via VPC peering or IPsec via VGW to learn there side subnet and advertise over each vMX AutoVPN , do you think it’s possible? Instead of vpc peering I think IPsec with VGW between VPC makes routing preference easy if I can BGP over it so my spokes treat one of vMX as primary Hub and when traffic reaches to customer VPC over VGW IPsec, in return traffic traffic also follow same path and not revet reply to secondary vMX.
I also could not find any AWS load balancer can make this easy at regional level traffic steering between VPC.
Appreciate any advise.
I would not attempt to do that. It would have significant complexity, and that complexity is likely to cause more outages than would be saved by the additional redudndancy.
Thanks for the response. So you are suggesting to put both vMX in one region but in different availability zone and connects to third-party vendor VPC over vpc peering or VGW IPsec.
I am the customer and all my sites connect to my vMX in my VPC but there is a third-party vendor VPC who host a resource which needs to be access by all on-prem spokes and that third-party wont allow to put vMX in there VPC which I why I am thinking to use VPC Peering or VGW IPsec.
What you think is it going to work? The only thing different here is two VPC.
What you need is to create a transit VPN. Put the VMXs into that.
Don't use VPN peering directly out of a transit VPC. It won't work. You'll need to Transit Gateway in your transit VPC. Other VPCs can peer to that if you like.
TGW for connecting one vendor VPC makes the solution expensive as TGW charges are high and there wont be any other VPC.
Transit VPC with two vMX and VGW to connect third-party vendor via IPsec wont work? Most likely the third-party vendor wont do any vpc peering.
I don't know. I have not tried that combination.
The reason VPC peering does not work is that it wont route traffic for remote subnets - only subnets within AWS itself.
Does the VGW support adding routes for remote subnets, or is it also limited to only advertising routes for directly connected networks in AWS as well?
We are in the process of deploying a vMX100 to one of our VPC's. This VPC is located in the N. Virginia region and we will be deploying systems across multiple availability zones. The primary concern I have is with regard to the deployment of the vMX100 in only one availability zone. If that availability zone goes offline, then I will lose connectivity to the remaining availability zone.
My question is whether or not there is a way to deploy the vMX100 in an HA configuration across two AZ's in order to provide redundancy.
Some alternatives, I have considered is deploying the vMX100 in my primary AZ, creating an ami of the system. I would setup monitoring of the EC2 instance in the primary AZ. If the instances fails, I would launch the AMI in another AZ, move the elastic IP to that instance and update the route tables. My plan would to have all this triggered by the monitoring.
A second thought was to use an elastic load balancer as a type of VRRP between the two AZs, both from and internal and external perspective.
I appreciate any thoughts on this.
Apart form my prior comment, I also think it is unlikely you'll be able to create an AMI. You can create AMI's of your own EC2 instances, but not usually of purchased appliances.
You would need to buy two vMX's and put one in each availability zone, servicing only that zone.
I have thought about that as well. Unfortunately, we are deploying systems across all five AZ's in the N.Virginia region. That could get quite a bit costly.
What if I provision two vMX100. The primary in AZ1 and the secondary in AZ2. Once provisions, the secondary will be shutdown. I will then using amazon to monitor the vMX100 in AZ1. If that fails, I will update the routing table to reflect the interface of the secondary vMX100 in AZ2, Power on the vMX100 in AZ2 and the move the elastic IP to that interface.
Yes you could do that for manual failover.
You wont need to move the elastic IP. Just give both of them a unique elastic IP. The MX's in the field will handle that fine.
In fact, you could just leave both of them running. To failover you would just update the Amazon routing table.
If i left both active, wouldn't I potentially run into an asymmetric routing issue since both will be advertising the subnets in the VPC, but the return route would only be through the active vMX?
On the spoke side, you have to add the hubs in an order. So it would always use the primary "hub". As long as the Amazon routing points to the same hub it will be symmetric.
unfortunately, I have about 10 locations that will be in a full mesh. Do you know if you can configure the vMX100 in an HA configuration across different availability zone similar to the Cisco CSR1000v? That might be an option.
No, I don't believe the vMX100 can be put into HA arrangements within AWS/Azure. Why don't you set up an ELB in-front on the two vMX100's with a policy to route all traffic to one instance and then fail-over to the second if the health-check fails?