I always have a dilemma when choosing between Python and node.js.
I kinda look at Python as something you use when you just want to "hack something out" and you don't care about the performance very much. I tend to use it more for batch style operations, or things that you want to do in a clearly defined set of steps in a specific order (e,g. provisioning a new network). In a simple step by step world, Python is easy to work with.
Indeed, node.js often requires additional effort to make things happen in a specific order.
I like using node.js either for server-side processing (such as providing an API), or when you have a lot of jobs you can do in parallel that are IO bound, and you don't care about the order they are done in (such as doing a task on every single network).
The async support in Python has thrown a bit of a spanner in the works for my thinking. It gives Python similar capabilities to node.js. And Meraki gives far more attention and support to Python. The Python SDK seems to get a lot more love.
Indeed, I think all POST API requests are still broken in the node.js library (I reported that quite some time ago along with the fix ...) forcing the use of the mega proxy. The node.js SDK is using the requests library, which is no longer in active development and which people are being encouraged to no longer use. So the node.js SDK is a poor cousin with little to no active development.
Out of curiosity, could someone post the download numbers (for say the last month) for each of the SDKs? It would be interesting to see what the percentage split is between them. For example, is the Python SDK making up 90% of the downloads?
I have a need to download the list of networks for around 200 orgs and store them in a database for subsequent processing.
So I wrote just the Meraki side in both Python (using AIO) and node.js.
The Python version was the easiest to do by far. I set the concurrent IO option to 200. It took 120s to run.
The node.js version took me hours to do getting the promises all chained together nicely. However, it took 6s to run. It averaged 30 API calls per second. If I used a faster machine it would be faster still.
So 120s versus 6s.
So I guess that kinda confirms my thoughts again. Python is best if you just want to hack something and don't need to run it again or you don't care about performance.
I think your conclusion is wrong here. I havn't used meraki with node.js yet, but I thinkt that there are some differences here on how ther meraki modules are handling the ratelimit in python and node.js
How did you configure the rateLimiter in node.js?
Are you using scopes in node.js?
the current async implementation in python uses a simple concurrt request counter, and if there are too many requests open, then it will wait for 0.3 seconds and check again (maybe we should lower that value).
the nodejs meraki module has the option of a scope, so the concurrent counter gets counted for each scope. I had a similiar idea, when I've implemented the counter. My problem here was, that I wanted to do it automatically, but you don't have the "orgId" in each request, so i've abonden the idea. I never thought of letting the programmer add it via "scope".
At the moment you could achieve something similiar with creating the AsyncDashboardAPI object for each organization.
You could also speedup your dependencies: just run
The node.js SDK has no rate limiter. node.js is a greedy async engine, it starts executing stuff as soon as you request it. Python is a lazy async engine, it doesn't start processing async requests until you call the event loop.
I'm not using scopes in node.js - I don't even know what they are.
It's hard to argue with it taking 120s in Python and 6s in node.js ...
I put this down to the node.js using a greedy async engine.
Python is a lazy async engine, it doesn't start processing async requests until you call the event loop.
that purely depends on how you are using it. When you are using a simple "await" then it is like that yes, but you could also use asyncio.ensure_future instead.
ps. I've repeatedly had this experience before as well - with node.js being either as fast or substantially faster than Python when do IO (pretty much any kind, http, database, etc).
This is the test Python code I used. Do you see a way I could speed it up by more than a factor of 10?
import json
import asyncio;
import meraki.aio;
async def processNetworks(aiomeraki: meraki.aio.AsyncDashboardAPI, org):
try:
networks = await aiomeraki.organizations.getOrganizationNetworks(org['id'])
for net in networks:
print(net["name"])
except meraki.AsyncAPIError as e:
print(f"processNetworks: Meraki API error: {e}")
except Exception as e:
print(f"processNetworks: error: {e}")
return;
async def main():
async with meraki.aio.AsyncDashboardAPI(
base_url="https://api-mp.meraki.com/api/v1",
suppress_logging=True,
maximum_concurrent_requests=200
) as aiomeraki:
# Get list of organizations to which API key has access
orgs = await aiomeraki.organizations.getOrganizations()
try:
for org in orgs:
print(org['id']+" "+org['name']);
except (Exception) as error:
print("main: Error while connecting to PostgreSQL", error)
finally:
orgTasks = [processNetworks(aiomeraki, org) for org in orgs]
for task in asyncio.as_completed(orgTasks):
await task;
return;
def lambda_handler(event, context):
loop = asyncio.get_event_loop()
loop.run_until_complete(main());
return {
'statusCode': 200,
'body': json.dumps('Finished.')
}
Believe it or not it's an nginx/configuration issue.
You are not hitting the api rate limit here. Instead you are hitting the rate limiter of the nginx proxy.
With the default configuration the python api will wait 60 seconds for the next retry.
I've created 200 test orgs with 10 networks.
When I set the nginx_429_retry_wait_time & retry_4xx_error_wait_time to 1 the whole run took 22 seconds with maximum_concurrent_requests=3 and ~6 seconds with maximum_concurrent_requests=200 with your test methods.
@chengineerI know this parameter is set to 60 seconds as a protection to the nginx, but do you think it would be possible to decrease it per default?
From a script performance perspective it is a lot faster to hit the nginx & shards with as much concurrent requests as possible and just handle the rate limits (nginx 1 seconds, shard whatever will be returned in the header), than limiting the concurrent requests to 3.
This is a sample code I've used for my tests
async def processNetworks(aiomeraki: meraki_v1.aio.AsyncDashboardAPI, org):
try:
networks = await aiomeraki.organizations.getOrganizationNetworks(org["id"])
netTasks = [aiomeraki.networks.getNetwork(net["id"]) for net in networks]
for task in asyncio.as_completed(netTasks):
n = await task
print(n["name"])
except meraki_v1.AsyncAPIError as e:
print(f"processNetworks: Meraki API error: {e}")
except Exception as e:
print(f"processNetworks: error: {e}")
return
async def max10():
async with meraki_v1.aio.AsyncDashboardAPI(
api_key=api_key,
base_url="https://api.meraki.com/api/v1",
suppress_logging=True,
maximum_concurrent_requests=10,
nginx_429_retry_wait_time=WAIT,
retry_4xx_error_wait_time=WAIT, maximum_retries=200
) as aiomeraki:
# Get list of organizations to which API key has access
orgs = await aiomeraki.organizations.getOrganizations()
for org in orgs:
print(org["id"] + " " + org["name"])
orgTasks = [processNetworks(aiomeraki, org) for org in orgs]
for task in asyncio.as_completed(orgTasks):
await task
Also thanks for the tip on the bottleneck module. I have started using it. I use one per org I am accessing (so if I am accessing 200 orgs I use 200 bottlenecks).
So now Python is only 5 times slower than node.js ... still a pretty big performance penalty.
Actually it is the same speed as node.js your python code takes also about 6 seconds. I've made an additional code to get the network. This created the additional seconds (200x10 requests)
E.g. 1 second up to random(concurrent_requests_counter/2)
The Problem with a fixed value is that you are just postponing the next rate limit.
E.g 100 requests are getting sent at once.
10 are going through -> 90 requests would currently be send 60s later -> 10 are going through.....
So you are hitting the rate limit again and again and again.
With a range we would spread the requests over the time period and not hit the rate limit all the time (even with an increased concurrent_request limit)