Is anyone else seeing this issue?
A few times a day ... sometimes a handful of times ... at least one of my WiFi clients starts emitting a surge of DHCP Requests -- I have in fact a pcap of 50 in a single second illustrating this behavior. Looks like a bug in the WiFi client to me. I manage a single building containing ~70 Meraki MR32/33, supporting ~750 WiFi clients per day
2020-01-07T07:20:24.528207-08:00 xxxxxx 8596: 008564: 000196: Jan 7 07:20:23.492 pst: %PM-4-ERR_DISABLE: dhcp-rate-limit error detected on Gi1/0/21, putting Gi1/0/21 in err-disable state (xxxxxx-1)
2020-01-07T07:22:25.078370-08:00 xxxxxx 8599: 008567: 000197: Jan 7 07:22:23.509 pst: %PM-4-ERR_RECOVER: Attempting to recover from dhcp-rate-limit err-disable state on Gi1/0/21 (xxxxxx-1)
The access layer switches (Cisco Catalysts) have what I believe are vanilla protective mechanisms configured:
interface GigabitEthernet5/0/1
switchport mode access vlan 123
storm-control broadcast level 1.00
storm-control multicast level 1.00
storm-control action shutdown
storm-control action trap
spanning-tree portfast edge
spanning-tree guard root
ip dhcp snooping limit rate 25
i.e. if broadcast traffic or multicast traffic exceeds 1% of the negotiated pipe size (typically 1000Mb/s) within a second, then the switch puts the port into err-disable. Similarly if the Catalyst sees a single port passing more than 25 DHCP transactions in a single second.
And the ports servicing the Meraki WAPs have the same mechanisms installed:
interface GigabitEthernet1/0/1
description Meraki AP
switchport trunk native vlan 100
switchport mode trunk
storm-control broadcast level 1.00
storm-control multicast level 1.00
storm-control action shutdown
storm-control action trap
spanning-tree portfast edge
spanning-tree guard root
ip dhcp snooping limit rate 25
Well, when one of these WiFi clients emits > 25 DHCP Requests in a single second, the Meraki AP forwards the first 25 of those Requests to the upstream Catalyst switch (which in turn forwards those Requests toward the DHCP Servers), and then err-disables the port. PoE shuts off, the Meraki AP goes dark.
There is an automated recovery mechanism, which re-enables the port after (2) minutes ... PoE lights the AP again ... it reboots ... so within (10) minutes, the event is over, and the Meraki AP is back on-line.
So this isn't a tragedy. But it happens enough that I would like to find another approach.
Ideally, Meraki would implement the 'dhcp-rate-limit' function in their OS, and then the AP would automatically disassociate the client ... allowing it to re-associate perhaps a couple minutes later. I have submitted this Wish.
Alternatively, I could remove the " ip dhcp snooping limit rate 25" protective mechanism and just let these buggy clients pound my DHCP Servers.
Anyway, I figured I'd ask: is anyone else seeing this?
--sk