I was just informed by a couple of our SD-Wan users that things were not working. I remoted in to one of their machines and indeed dns is not working correctly. Before I remoted in I had them reboot and they still had problems.
In remote session, I can ping assets on the LAN via IP address; can get to web server on LAN via IP address, but all nslookup's fail with Server Unknown nxdomain. I tried nslookup with both dns server name and hard-coded IP address. We have 2 Umbrella dns appliances; neither are working (nor are the dns servers they forward to--Win 2022 DNS servers).
I can ping their laptops from the Appliances (MX-68 at their location running through ms-120; MX-95 at Hub). Nothing indicating an issue in the event logs.
I had one of the users that was experiencing issues disconnect the network cable, join Wifi and connect via VPN. They said doing that allows everything to work as expected.
Any ideas what I might try to diagnose the issue?
Solved! Go to solution.
After 4 days of downtime and ZERO help from AT&T (HUGE mistake getting this through them, been an absolute nightmare and after 8 months STILL does not work correctly and they keep going in and changing things without talking to me as in this case), I finally saw that they had re-enabled Umbrella at the SD-Wan location and did not add any domains (including local) to the bypass input in the Umbrella section of the Threat Protection page where it says: "Specify one or more domain names below (one per row) to be excluded from being routed to Cisco Umbrella." Once I added the local domain in there things started working again.
Can you ping the dns servers or do nslookups to that server and start firewall logging. See if something get blocked/allowed
https://documentation.meraki.com/MX/Firewall_and_Traffic_Shaping/Firewall_Logging
Thanks for the reply. I am able to ping the dns servers via ip address but not via dns name. However nslookup does not work at all neither with server name nor IP address. Specifically, I keep getting UnKnown: can't find <name>: Non-existent domain errors when trying nslookup.
Looks like there was an update over the weekend (I have the MX's set to auto-update (stable) on weekends). It's now running 18.211.3. Doesn't sound like the update/reboot fixed the issue; everyone in the office is reporting the same problem. They can connect to WiFi and VPN in, but the SD-Wan is only allowing them to access the internet.
I'm watching the firewall log and there isn't anything being blocked. Is there something specific I should be looking for? I can also see traffic to the DNS server being allowed.
On the Tools page, on DNS lookup, I can lookup the local domain with DNS server set as ip address, but not domain.
Any other tips would be appreciated.
A few questions to better understand:
- You mention you have Umbrella appliances. Are these the appliances that the clients are trying to use for DNS? Are you able to point an nslookup directly to your Win2022 DNS servers and validate whether that works?
- Are you trying fully qualified domain names or just friendly names? Does that make a difference?
Thanks for the reply.
Yes, these clients are set to use the Umbrella appliances for DNS via DHCP. Although, setting DNS to any of the other DNS servers manually (or using them for nslookup) yields the same result. However, all of the DNS servers including the appliances return answers to non-local queries (e.g. google.com), they are only failing to return internal domain names. I can reach the internal resources via IP address though (e.g. internal file share and https server; I can even ssh to the dns appliances from the clients using their IP address).
Tried both FQDN and short. nslookup for the domain itself fails as well (these are all Windows 10 machines domain joined).
Trying to figure out what is going on, I started disabling WAN interface (I have/had 2 connections at each location; fiber and cable at LAN and fiber and cable at SD-Wan location). After disabling one of the WAN interfaces at the SD-Wan location, I was unable to VPN back in there to continue testing. Looking further, I saw that the SD-Wan VPN tunnel is for some reason trying to connect to the backup/failover/WAN 2 IP address at the LAN location. I also noticed that the IPv6 status is "failed" for WAN2 at the LAN location. I'm going to try to unplug the LAN WAN2 interface and see what happens.
After 4 days of downtime and ZERO help from AT&T (HUGE mistake getting this through them, been an absolute nightmare and after 8 months STILL does not work correctly and they keep going in and changing things without talking to me as in this case), I finally saw that they had re-enabled Umbrella at the SD-Wan location and did not add any domains (including local) to the bypass input in the Umbrella section of the Threat Protection page where it says: "Specify one or more domain names below (one per row) to be excluded from being routed to Cisco Umbrella." Once I added the local domain in there things started working again.