MS130's w/17.1.4 dropping DNS packets

Solved
ggatten
Here to help

MS130's w/17.1.4 dropping DNS packets

We recently refreshed one of our larger customers.  They had MS125's before, now the MS130's.

 

Since the refresh they've been experiencing "slow" internet.  After packet captures, we've determined that DNS responses (maybe others) are getting dropped.  Ie:  We see the query from the client, we see the response come back (timely) to the core of network, but somewhere along the way (MS130's) drop it and it never reaches the client.

 

Wired and wireless experience the issue.  Meraki switches and AP's alert with "misconfigured dns".  It's not misconfigured, and will self resolve after some time.

 

Issue seems to exist at all times, but MUCH more impactful as traffic volumes ramp up.  We have 130's at several customers but they are not reporting symptoms.  Guessing they exist, but the traffic loads are much lighter.

 

Ticket open with Meraki, but wondering if the community has any thoughts?

 

Thanks!

1 Accepted Solution
ggatten
Here to help

Meraki may have resolved it with their double top-secret access.  Disable "dns analytics", which we had asked earlier to disable anything related to dns inspection. It's apparently a known bug.  😞

Any Meraki support person should be able to do this.  In testing, my colleague found the issue manifests around 300-400 dns queries / sec.  Sounds like a lot, but I think ChromeOS is duplicating requests AND not caching results.  I'll do more testing on this.

View solution in original post

14 Replies 14
Mloraditch
Head in the Cloud

I can't say it's the exact same as we never looked at DNS and just focused on speed via iperf tests but we did find a known bug involving MS130Xs in certain scenarios causing slow uplink speeds. Ours was an MS130X uplinked to an MX75, the interim workaround was putting an MS130 (not X) in between.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
ggatten
Here to help

I think I saw that bug - don't think it's related.  The throughput seems fine, assuming they can actually resolve the names to connect.

 

Once the client doesn't get a response, then it retries...  things get amplified.  Colleague just noticed ~ 7000 dns pps.  

ggatten
Here to help

PS: This location uses ChromeOS - which I don't think local dns caching is actually working, but haven't done enough testing to be certain.  google apps resolve a lot of name - so add all this together and it's making for a bad week.

jbright
A model citizen

I had a similar experience. I tried to load MS 17.1.3 on a number of older MS120, MS220 and MS320 switches, and yes, I know the MS220 and MS320 switches aren't technically supported. After the upgrade, Cisco phones would not register to the Webex cloud and the overall network performance was very poor. I reloaded MS 16.9 on all of the switches and they all started working normally again. I am going to wait for a while and let the programmers iron out some more bugs before I try the MS 17.x code train again.

ggatten
Here to help

How were you able to go back to 16.x?  Not an option in my dashboard.

Mloraditch
Head in the Cloud

You have to contact support at this point, but they can do it.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
Inderdeep
Kind of a big deal
ggatten
Here to help

17.1.4 is stable.  

 

Would be nice if I could disable any sort of DPI type stuff.  Just give me a firmware that will move packets and have basic Meraki mgmt functionality.  Or just fix it - that would also be a good option.

jbright
A model citizen

There is an option on the firmware upgrade screen to revert to the previous version of software, but I think that is only available for a few days after the upgrade and then you have to engage support. And I found out the hard way that if you do a staged switch upgrade, that option is not available if you have to revert to the previous version. Your only option is revert all of the switches at the same time. That may cause more problems, depending if some switches need to be reverted first. There is definitely room for improvement in the firmware revert process.

PhilipDAth
Kind of a big deal
Kind of a big deal

Random question: does changing the Traffic Anaytic settings under Network-Wide/General alter the problem in any way?

 

PhilipDAth_0-1738264805004.png

 

 

ggatten
Here to help

Interesting.....  Maybe - but Meraki may have resolved it with their double top-secret access.  Disable "dns analytics", which we had asked earlier to disable anything related to dns inspection. It's apparently a known bug.  😞

ggatten
Here to help

Meraki may have resolved it with their double top-secret access.  Disable "dns analytics", which we had asked earlier to disable anything related to dns inspection. It's apparently a known bug.  😞

Any Meraki support person should be able to do this.  In testing, my colleague found the issue manifests around 300-400 dns queries / sec.  Sounds like a lot, but I think ChromeOS is duplicating requests AND not caching results.  I'll do more testing on this.

RaphaelL
Kind of a big deal
Kind of a big deal

DNS analytics ? never heard of that ! 😮

rhbirkelund
Kind of a big deal
Kind of a big deal

I'm rather intrigued about this as well!

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.
Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels