Backup and restore

Adrian4
Head in the Cloud

Backup and restore

Hello,

 

I have written a script that backups up whatever config I like to json files and then can restore from those files.

However I got all the way to finishing it before I realized that its not quite right.

 

For example, for switch routing interfaces, the restore uses post command to create a new interface....but what if the reason for restoring isn't because of a full delete, what if someone just did loads of editing and messed up and wants to revert - this wont work.


I was wondering what peoples thoughts were on the best way for the code to decide on post or put commands?
At the moment my main idea is to have the restore script first delete any existing and then do a post which would work, only to do this, it would have to do a GET to figure out what's there and then I guess an action batch to delete all at once would be the most efficient - but thats a whole lot of extra code considering ill need to do this for every config type 😞

Cant anyone think of a better idea?



16 Replies 16
PhilipDAth
Kind of a big deal
Kind of a big deal

Potentially use a JSON editor to delete things they don't want restored, or maybe copy the part they do want restored to a new file.

Adrian4
Head in the Cloud

but if the code is just using a post request (well, a batch create to be precise) and the thing they are restoring already exists, it will fail.

 

The code kinda needs to know if it needs to do a post or a put. If it was going to be tailored for each and every item/interface etc then it would be very messy, both to code and to use i think.

The idea is to just be able to quickly and easily revert to a previous state. The more I think about it, the more I think doing a wipe first is the only option - although its quite scary...what if the wipe operation works but the restore doesnt!!! 😥

PhilipDAth
Kind of a big deal
Kind of a big deal

How are you handling things that can be posted back "as is"?  For example, firewall rules (where you have to drop the default route), snmp settings (which needs a different response depending on version), etc?

Adrian4
Head in the Cloud

firewall rules are easier because it doesnt treat rules as individual items, its the rule table as a whole.

So I just send the backed up list and it overwrites the existing table with the new one.

To deal the with default rule, the script just removes that rule when its creating the backup, so its never written to the Json.

PhilipDAth
Kind of a big deal
Kind of a big deal

It sounds like you need to perform a "diff" of some kind to form the changes.

 

 

Simply performing a delete may not be straight forward.  For example, you can delete a switch 3 interface if DHCP is enable.

Adrian4
Head in the Cloud

ah, didnt know that - thats going to be a headache 😛

rhbirkelund
Kind of a big deal
Kind of a big deal

I think that in a perfect setting, you'd need some form of automatic decision making that will add or update configuration based on intent.This might require some more advanced software design.

 

Perhaps, you should consider making the decision in your code that if config already exists, then remove all configration, and then push new configuration based on the lastest backup you have on file.

 

So as a mockup, that would probably be something like.

SwitchIfsBackup = getSwtichRoutingIfs(backupfile)
SwitchIfsCurrent = getSwitchRoutingIfs(organization_id)

if len(SwitchIfs) == 0:
    postSwitchIfsFromBackup()
elif len(SwitchIfs) < len(SwitchIfsBackup):
    deleteSwitchIfs()
    postSwitchIfsFromBackup
elif len(SwitchIfs) == len(SwitchIfsBackup):
    investigateSwitchIfs()
else:
    raise SystemError("Unexpected condition")

 

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
Adrian4
Head in the Cloud

this hurts my head a bit....but what about if the script just tries a update operation first, and then if it recieves a certain error code, it then tried a create?

rhbirkelund
Kind of a big deal
Kind of a big deal

When you create a new interface, and Meraki adds an ID to it, and this is what you reference, if you need to update it.

Assume someone logs on to the dashboard, deletes the interface, and creates it again. Now it has a new ID, different from what is in your backup. What will you do? It's the same interface, but a different ID, thus a different interface, when compared to your backup.

 

Or someone has changed the interface, and something broke. Now you need to recreate it based on your backup. You'll need to get all the interfaces, run through each interface and each setting, and check if it matches your backup. If not, you'll need to update it.

 

You're going to need to mock up some scenarios inorder to design your way out of it. Python does what you tell it to. There's no intelligance in it. So you'll need to prepare for every possible (and realistic) scenarios.

 

I'd probably just make the decision that "my backup is the one backup to rule them all", and decide that whatever is configured on the dashboard will be overruled by my backup.

It's a backup of a working system, afterall.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
Adrian4
Head in the Cloud

"

I'd probably just make the decision that "my backup is the one backup to rule them all", and decide that whatever is configured on the dashboard will be overruled by my backup.

It's a backup of a working system, afterall."


yup - that is exactly what I want. I dont want to get into details about precise changes - I just want to completely replace all of what is there with what is in my backup file. Revert to a previous snapshot in time.

comparing current config to the backup file in order to decide what to do is interesting, but I cant quite see how to use it.

I think sticking witht eh wipe everything first idea is the way forward. I need to just identify which cases it will be a problem, like the routing interfaces where DHCP is enabled.

Will just leave those out for now and focus on the low hanging fruit.

I'm also going to have to figure out some process flow to try and avoid a situation where the wipe happens but then something stops the restore working and user is left with no config at all.

*EDIT*
Action batches are all or nothing arnt they? perhaps I could put the destroy and create actions in a single batch file to avoid that issue?

alemabrahao
Kind of a big deal
Kind of a big deal

Doing the GET and validating what you have configured before, in my opinion, would be the best option, although I find doing this a lot of work in practice.

 

I personally only use scripts to copy the configuration from one organization to another, I don't find it very useful within the same organization.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
Adrian4
Head in the Cloud

what happens if someone accidentally deletes a switch, or a whole network? 

Yea, thinking about it more i think a config comparison might be best. Ill compare and create an action depending on the differences

rhbirkelund
Kind of a big deal
Kind of a big deal

"[...] , or a whole network?"

 

For this to happen, you need to have unassigned all devices from the network, and this can't be done all at once. You can't delete a network, if there are devices assigned to it.

So if this happens, someone has really put in an effort to do so.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
alemabrahao
Kind of a big deal
Kind of a big deal

Deleting a device or an entire organization is not that simple, you have to delete several other things first. As explained by @rhbirkelund .

 

I'm not saying that it's not important to have a backup, but in the case of Meraki I honestly don't worry that much. Because if you have everything well documented it is very easy to set up a new network.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
Adrian4
Head in the Cloud

I can think of at least 1 example where a network that was supposed to be removed was confused with one that was not. Devices were removed, network was deleted.

Luckily it wasn't a very big site and someone had screenshots of the bits of config that got wiped.

Network deletion is an extreme example but the point is, **** happens and we need an bullet proof response to any possible situation. If something happens to one of our production sites, we hemorrhage serious money every hour we don't get it back up and running. So I'll sleep better at night knowing I can click 3 buttons and fully restore any kind of config in seconds, even if I never need to use it.

Adrian4
Head in the Cloud

so, just as a follow up in case anyone is following and interested - turns out the extra logic wasn't as much of a headache as I expected...

using static routes as an example-

Initialize an actions list
Read the backup file and pull in the data

Do a GET call to get current data

Iterate through backup data and create a list of the route IDs
also, for each iteration, take the id and look through the current route data to try and find a match

if id doesn't exist in current config, append a create action to the action list
elseif there is an id match - compare dictionaries to see if there are discrepancies. If yes, append update action to action list

then do a iteration through current routes data.

if current route id not in the list of ids we created in the backup data iterations - then append a destroy action.



I think this covers all eventualities. Need to do a lot of testing but im quite hopeful 🙂

Get notified when there are additional replies to this discussion.