cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Sharing experience - Tag Based IPsec VPN Failover

Highlighted
Conversationalist

Sharing experience - Tag Based IPsec VPN Failover

Hi all !
 
Sharing experience with community regarding a specific case !
We needed a failover solution for Zscaler VPN tunnel, in case of connectivity issue on primary Zscaler node.
 
 
At first, it was a little difficult to understand, but then I realized there was some errors in the documentation. I may be wrong, but here is what I think :
 
Line 12 : uplink must be equal to wan 1, not different. We'll see later, but for that, we can also specify it in the URL requested.
Line 26 : There are brackets missing for print function  : print("Need to change VPN, recent loss - "+str(iteration['lossPercent']))
In the site to site configuration screenshot, both tags must be Up. The script change tags on Networks, Always one UP and one DOWN, not on VPN tunnels. If one is UP and the other is DOWN, the network will match both, it will not work.
Screen Shot 2019-03-01 at 11.46.49 AM.png
 
That was very useful, and provided a quick solution. But then, I had a lot of limitations with this script, so I enhanced it : 
 
There can only be those tags on network, in positions 1 and 2. When a Swap occurs, all other tags are lost.
I used an array to keep all tags except Zscaler related, to add them again after.
 
In HA, the two devices are processed. So in the event of a slight difference between metrics, we could have issues. I had this case :
Issue on Primary Zen, so Swapped to Backup Zen. After 5minutes, all results for member 1 are ok, so it swapped back to Primary, and then, going through member 2, which still has a metric above the threshold, which resulted in a Swap again to Backup Zen.
Example, loss results returned by API :
Member1 :
Loss = 29 (Below threshold, Swap back to Primary)
Loss=0
Loss=0
Loss=0
 
Member2 :
Loss = 31 (Above Thresholh, Swap again to Backup)
Loss=0
Loss=0
Loss=0
 
Solved by keeping the last Network processed in a variable, and to skip if it is the same. 
-> I didn't found a solution to get "Master/Slave" status with API, so I could check only the Master. Is it possible ?
 
It only skips if monitored IP is 8.8.8.8. Added an array with IPs we want to skip. Could also do the opposite, by specifying Monitored IP in a ipToInclude for example.
 
Added "Latency" metric check in addition of "Packet Loss"
 
Added a ZEN_Forced tag part, to have the possibility to force a ZEN in the dashboard. In case this tag is added, the script will skip the checks for this network.
 
One of the most problematic part was that this script needed to run permanently, and in case of issue, we would lose the "Network Down" information, and then, after another run, there would be no Swap back to Primary.
I added a "ZEN_Swapped" tag, to keep information. Script can then be run on one time basis by removing the While loop and sleep.
 
I'm sure it's not perfect, and needs more improvements, but if it can help someone... 🙂
It could be good also to update the Meraki Documentation with corrections, and why not some of these changes to the code.
 
 
Regards,

 

import requests, json, time

api_key = ''
org_id = ''
#Specify monitored IPs to exclude from the script, typicaly all non Zscaler IPs you monitor
ipToExclude  = ['8.8.8.8','8.8.4.4','208.67.220.220','208.67.222.222']

url = 'https://api.meraki.com/api/v0/organizations/'+org_id+'/uplinksLossAndLatency?uplink=wan1'
header = {"X-Cisco-Meraki-API-Key": api_key, "Content-Type": "application/json"}

previousNetwork = ""

while True:
    response = requests.get(url,headers=header)
    for network in response.json():
        tagsAfter = [] #Array with final tags
        tagsString = "" #String with final tags
        if network['ip'] not in ipToExclude and network['networkId'] != previousNetwork:
            skipNetwork = False
            network_info = requests.get("https://api.meraki.com/api/v0/networks/"+network['networkId'], headers=header)
            print("-------------------------------------")
            print("Network Name : "+network_info.json()['name'])
            print("Network Id : "+network['networkId'])
            print("Device Serial : "+network['serial'])
            print("Monitored IP : "+network['ip'])
            loss=False
            tagsBefore = network_info.json()['tags'].split(' ')
            swapped = False
            #We get all tags of Network, and specificaly Primary and Backup ZENs. If there is a ZEN_Forced tag, we stop
            for tag in tagsBefore:
                if "ZEN_Forced" in tag:
                    skipNetwork = True
                if "ZEN_Primary" in tag:
                    primary = tag
                    print("Primary ZEN : " + primary)
                elif "ZEN_Backup" in tag:
                    backup = tag
                    print("Backup ZEN : " + backup)
                elif tag == "ZEN_Swapped":
                    swapped = True
                else:
                    tagsAfter.append(tag)
            if skipNetwork:
                print("ZEN Forced, skip network")
                break
            #We then check connectivity Health, and if conditions are not met, we Swap Backup and Primary, and add a ZEN_Swapped tag
            for iteration in network['timeSeries']:
                if iteration['lossPercent'] >= 30 or iteration['latencyMs'] >= 100:
                    loss=True
                    if swapped == True:
                        print("VPN already swapped")
                        break
                    else:
                        print("Need to change VPN, recent loss - "+str(iteration['lossPercent'])+"% - "+str(iteration['latencyMs'])+"ms")
                        tagsAfter.append(primary.split("_Up")[0]+"_Down")
                        tagsAfter.append(backup.split("_Down")[0]+"_Up")
                        tagsAfter.append("ZEN_Swapped")
                        for tag in tagsAfter:
                            tagsString+= tag + " "
                        print("New List of Tags : "+tagsString)
                        payload = {'tags': tagsString.strip()}
                        new_network_info = requests.put("https://api.meraki.com/api/v0/networks/"+network['networkId'], data=json.dumps(payload), headers=header)
                        break
            #If connectivity Health is back to normal on Primary we swap back
            if loss==False and swapped == True:
                print("Primary VPN healthy again..Swapping back")
                tagsAfter.append(primary.split("_Down")[0]+"_Up")
                tagsAfter.append(backup.split("_Up")[0]+"_Down")
                for tag in tagsAfter:
                    tagsString+= tag + " "
                print("New List of Tags : "+tagsString)
                payload = {'tags': tagsString.strip()}
                new_network_info = requests.put("https://api.meraki.com/api/v0/networks/"+network['networkId'], data=json.dumps(payload), headers=header)
        previousNetwork = network['networkId']
    print("Sleeping for 30s...")
    print("#####################################")
    print("#####################################")
    time.sleep(30)
    
3 REPLIES 3
Conversationalist

Re: Sharing experience - Tag Based IPsec VPN Failover

Awesome Guillaume !

Thank you very much 🙂

Conversationalist

Re: Sharing experience - Tag Based IPsec VPN Failover

Hi,

 

I saw that errors 2 and 3 were corrected on the online documentation, good to see it's useful to participate 🙂

 

There is still the line 12 regarding WAN1.

It should be "==" not "!=" , otherwise it will skip all WAN1 checks and only check WAN2, if existing.

 

-> if network['ip'] != '8.8.8.8' and network['uplink']=="wan1":

 

It could also be done directly in the API URL, line 4 :

url = 'https://api.meraki.com/api/v0/organizations/<org_ID>/uplinksLossAndLatency?uplink=wan1

Meraki Employee

Re: Sharing experience - Tag Based IPsec VPN Failover

Nice!!! We published this in our shiny new Developer Hub that went live this week: https://developer.cisco.com/meraki/explore/tag-based-ipsec-vpn-failover/

 

 @Guillaume6hat seems we may want to merge your changes to the original script and get it into GitHub, they make it even better 😄

 

Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.