This may be a bit of a needle in a needlestack, at the moment, but here goes.
I've been testing a Meraki VPN in parallel with my production network (SonicWALL). I have a MX64 currently serving as a hub (yes, I'd go with a bigger model at a later point) and it is connected directly to our core switch stack. I then have a static ip router assigned on the switch pointing to the Meraki network. I then setup a remote network (MX64) and setup a VPN. A couple of learning moments later, the VPN was up and running and functionality testing commenced. Aside from some minor disappointments (noted in other messages), the VPN and network seem to work fine. Fast forward a week. I'm troubleshooting performance issues with my primary domain controller (Windows Server 2012 R2). For no real apparent reason, the CPU is spiking (99%) and memory is being inhaled. The processes suggest WMI or DNS, but I can't find anything wrong. Nothing has changed with the environment and a reboot of services (and finally the server) made no difference.
So, why the heck is this dude posting here? Fair question. Getting to that.
After not finding anything leading to answers, I did find a posting of a potential DHCP server conflict. My Meraki MX64 units are serving as DHCP servers. So, I disable the DHCP server on the core unit, as I don't need DHCP. No real change. So, I decide to disconnect the core Meraki completely from the core switch and the domain controller immediately went back to normal. Great! Well, except I need to make a Meraki network. Not great. So, I reconnect the core MX64 and sever the VPN to the endpoint MX64. Again, the domain controller immediately starts acting normally. So, now I come to you, the Meraki community, hoping that someone else has ran into this before and can point me in the right direction. What could be wrong (or at least need to be accounted for) with my remote MX64 (or network)? I need DHCP enabled. I have a super basic config. Its a home network, so there isn't significant traffic flowing. Whatever it is, it seems to be impacting my domain controller in 5 minute cycles. I'm currently working to strip down my home network, to help isolate points of interest, but I wanted to get this out there, in case someone knew where to start.
I appreciate any assistance.
It sounds to me like some remote device connecting over the VPN is pounding your DC. Perhaps try disconnecting the remote devices on the other end of the VPN one at a time till you find which is causing it.
Interestingly enough, I had a somewhat similar issue with an older DC (my backup DC). I believe it came down to the Active Directory integration on the MX100 we have. Meraki uses WMI to read the security log and polls VERY frequently. I'm not 100% certain, but I think it tries to read the ENTIRE log each time. I ended up reducing the size of the log kept on the domain controller for those security events so that it wasn't trying to read as much data at each interval and that seemed to clear up our issue. It's been a while since I had to look into that, but it might be a good place to start?
Wade, if you could only have gave me this info before I stayed up until 2am figuring it out. 😉 Yeah, it looks like AD Authentication is the issue. I disabled it last night and was able to see improvement in pretty short order. I didn't have any luck making adjustments with the domain control to avoid the CPU issues. As such, I'm leaving it disabled, for now, but will want to figure out what needs to be done to make it work nicely with my DCs.
Reducing the log file size on the DC may help with this, it did in my case.
You could possibly spin up another DC, make it RODC if you prefer, and point the Meraki MX to it for authentication therefore taking the workload off of your prod DCs. I believe you could set the logging on the new DC to be reduced to hopefully help with the overhead like others have said but keep the logging as is on the prod DCs. Kind of overkill to spin up another DC for just this but it may help.
Yeah, we're considering a modification to our DC/GC server design. Up to this point, we've just had a primary (physical) and a secondary (virtual) and they have been able to basically sleepwalk through the day (light workload). However, as in this case, if there is a problem, it can impact several services/functions of the particular server. I have an open ticket with Meraki support, but no progress, yet. Due to current syslog server limitations right now, I really don't want to reduce the size of the security log any more than necessary.
I have been told that if the account your MX product uses to authenticate to your DC has ANY special characters in it at all, that can cause WMI 100% CPU Usage on your DC.