The closest I think you will be able to do is to put an vMX in each availability zone, and have it serve just that availability zone. Then setup your services so they are spread across those two zones, and can failover between them.
We are in the process of deploying a vMX100 to one of our VPC's. This VPC is located in the N. Virginia region and we will be deploying systems across multiple availability zones. The primary concern I have is with regard to the deployment of the vMX100 in only one availability zone. If that availability zone goes offline, then I will lose connectivity to the remaining availability zone.
My question is whether or not there is a way to deploy the vMX100 in an HA configuration across two AZ's in order to provide redundancy.
Some alternatives, I have considered is deploying the vMX100 in my primary AZ, creating an ami of the system. I would setup monitoring of the EC2 instance in the primary AZ. If the instances fails, I would launch the AMI in another AZ, move the elastic IP to that instance and update the route tables. My plan would to have all this triggered by the monitoring.
A second thought was to use an elastic load balancer as a type of VRRP between the two AZs, both from and internal and external perspective.
I appreciate any thoughts on this.
Apart form my prior comment, I also think it is unlikely you'll be able to create an AMI. You can create AMI's of your own EC2 instances, but not usually of purchased appliances.
You would need to buy two vMX's and put one in each availability zone, servicing only that zone.
I have thought about that as well. Unfortunately, we are deploying systems across all five AZ's in the N.Virginia region. That could get quite a bit costly.
What if I provision two vMX100. The primary in AZ1 and the secondary in AZ2. Once provisions, the secondary will be shutdown. I will then using amazon to monitor the vMX100 in AZ1. If that fails, I will update the routing table to reflect the interface of the secondary vMX100 in AZ2, Power on the vMX100 in AZ2 and the move the elastic IP to that interface.
Yes you could do that for manual failover.
You wont need to move the elastic IP. Just give both of them a unique elastic IP. The MX's in the field will handle that fine.
In fact, you could just leave both of them running. To failover you would just update the Amazon routing table.
If i left both active, wouldn't I potentially run into an asymmetric routing issue since both will be advertising the subnets in the VPC, but the return route would only be through the active vMX?
On the spoke side, you have to add the hubs in an order. So it would always use the primary "hub". As long as the Amazon routing points to the same hub it will be symmetric.
unfortunately, I have about 10 locations that will be in a full mesh. Do you know if you can configure the vMX100 in an HA configuration across different availability zone similar to the Cisco CSR1000v? That might be an option.
No, I don't believe the vMX100 can be put into HA arrangements within AWS/Azure. Why don't you set up an ELB in-front on the two vMX100's with a policy to route all traffic to one instance and then fail-over to the second if the health-check fails?
I've found these AWS papers about VPN monitor and seems like a viable solution to deploy vMX100 in HA on AWS:
I have a customer that wants to deploy vMX100 appliances in HA on AWS but basically they want to have redundancy un case of a failure in a AZ that causes the termination of the instance.
Based on that this is what I'm thinking ... use the CloudWatch service to monitor the VMX100 EC2 instance and create a rule so that in the case of shutdown or termination of the primary vMX100 instance to trigger a function in Lambda to update the VPC: AWS route table to a secondary vMX100 in a different AZ.
What do you guys think, maybe too complicated ?
I have now managed to come up with a solution for this.
The short answer - I did this using an Amazon AWS Lambda script to check the status of each vMX and dynamically update the Amazon AWS route tables. So if one vMX goes down everything will fail over nicely to the other vMX.
I have had my first crack at documenting the process and requirements here: