Cisco+ Secure Connect connected MX - Some app traffic not getting through cloud firewall

Solved
JamesHammy
Getting noticed

Cisco+ Secure Connect connected MX - Some app traffic not getting through cloud firewall

Hi all,

 

We have recently implemented the Meraki/Cisco Secure Connect SD-WAN solution and have all of our MX devices now connected to each other via the Secure Connect VPN feature along with Umbrella and the cloud-based internet break-out. This gives us a nice centralised internet solution whether in or out of the office.

 

Whilst it works relatively well, there seems to be absolutely no useful tools for analysing the outbound internet-based traffic and whether something is blocking certain apps/processes.

 

We can see from the Umbrella DNS/firewall/web logs that calls to a certain URL/domain are being 'Allowed' and that nothing is being logged as blocked.

 

We have a third party installed application that communicates to a cloud-based Postgres SQL instance and we simply cannot get it work work via Secure Connect. Nothing is showing as being blocked and only 'Allowed' entries in the log, all of which relate to the app.

 

If I only have a URL/domain, how on earth can I get the Secure Connect firewall to just allow the traffic, or even see why it's being clearly being blocked but not logged? I cannot use the MX's 'Local internet breakout' feature to resolve this particular situation because the cloud service the app connects to has some kind of load balancing in front of it and the IPs change on an almost minutely basis. They're also on massive subnets so excluding them all would be madness. Plus, we've added IP after IP and never managed to get it work even once. The MX does have a feature to exclude a URL/DNS entry but it does not work and I suspect that may be because we use the Umbrella agent for DNS protection on our endpoints? Meaning the DNS query would never hit the MX?

 

We also provide remote connectivity for our users using the Secure Client (AnyConnect component) to the same Secure Connect instance and had the same problem while out and about. We have resolved the situation while working remotely by excluding the URL/DNS domain from the client VPN tunnel, via the basic config settings, and this works perfectly.

 

Any ideas why we can't do the same from traffic originating from our office-based MX devices?

1 Accepted Solution
JamesHammy
Getting noticed

Final update and we have resolved the situation.

 

This has been a right pain to fix but we got there in the end.

 

Several things we went through:

 

  1. We tested that the traffic did indeed move between the site-to-site tunnel and WAN connections when adding exclusions. Packet capture verified this.
  2. We have a static IP provision in our Secure Connect environment but this only applies to traffic destined for services on ports 80/443 traffic. Outbound traffic to services running on any other ports will originate from Cisco's massive pool of dynamic IPs. This is a ridiculous oversight/limitation of the static IP provision cost. Meraki/Cisco should take note!
  3. The traffic for this troublesome app contacts an initial site for auth and license checking using HTTPS but then contacts the database instance on port 40,000-ish. This meant the traffic would originate from different source IPs, as per point 1. At this point, we considered that maybe the app's cloud server may be using session/IP-pinning of some kind and the disparity was causing the issue.
  4. The cloud service has some kind of load-balancers in front of the auth site, which changes the destination IP every few minutes. At this point we almost threw in the towel as the IP differences were at the third octet so excluding them would have been all but impossible without inadvertently excluding massive chunks of the internet.
  5. We couldn't perform DNS-based local internet breakout exclusions because we do not use the MX to proxy DNS requests. We use DNS This is how we fixed the issue on the client VPN but was impossible here. I managed to get the DNS-based exclusions working for a single laptop by altering a test VLAN to provide DHCP with 'upstream proxied DNS'. This worked immediately but is impractical due to the fact that the Meraki MX device uses the primary WAN port's DNS server to resolve, which in our case is Google's DNS. This means we lost local DNS resolution for the domain, which is a show-stopper. I don't know whether MX WAN interfaces would allow an internal DNS server (domain controllers) to be configured so didn't attempt.

 

At this point we figured we'd give the Umbrella admin interface a deeper look in conjunction with the fantastic dbgview.exe from sysinternals.

 

Once we established all the DNS names the app was contacting, we looked at the historic IPs associated with the DNS/URL entries, within Umbrella, and could see that although the IPs rotated quickly, there were only actually about 5 of them per service (about 17 total) and they'd been consistent for the last two years.

 

We added all of the IPs to the tunnel exclusion and it finally worked. It's not ideal adding so many IPs to the list but given the massive change that would be involved with reconfiguring our internal DHCP scopes to use the MX for proxying resolution to our domain controllers, we'll be leaving it as-is.

 

What a faff. Honestly, I'm not sure the Secure Connect solution is really worth it, given the limitation around static IP addressing and the extra complexity of internet breakout. Couple that to the complication of using the Umbrella agent on the local laptops, it makes DNS a much bigger pain to administer.

 

Anyhow, all sorted.

View solution in original post

8 Replies 8
PhilipDAth
Kind of a big deal
Kind of a big deal

What I would do is a packet capture on the MX on the AutoVPN tunnel for traffic going to the PostgreSQL server.  Verify that there is two-way traffic, and then check that it is completing the authentication ok.

JamesHammy
Getting noticed

Funnily enough, Wireshark from the workstation was going to be the first call of the day this morning. 

Is there a packet capture function on the MX itself? The team haven’t come across that one before. 

thanks for the reply!

PhilipDAth
Kind of a big deal
Kind of a big deal
JamesHammy
Getting noticed

Legend, thanks @PhilipDAth 

 

I’ll update with results after investigation. 

jimmyt234
Head in the Cloud

I often find these dedicated apps do some form of certificate pinning and doing any kind of HTTPS inspection causes issues. (This is presuming you have HTTPS inspection turned on for your Roaming Computers Web Policy).

 

Try adding the FQDN/URL for the application to the Selective Decryption Lists section.

JamesHammy
Getting noticed

Oh, we've had right issues in the past with HTTPS inspection so have had to avoid it, so that at least rules that one out. Thanks though!

JamesHammy
Getting noticed

Ok, quick update.

 

When configuring an IP-based Local internet breakout, we can see from the packet capture that the traffic does indeed move from the Secure Connect VPN tunnel to the local WAN1 break-out, which is good.

 

The connection still fails though. Argh!

JamesHammy
Getting noticed

Final update and we have resolved the situation.

 

This has been a right pain to fix but we got there in the end.

 

Several things we went through:

 

  1. We tested that the traffic did indeed move between the site-to-site tunnel and WAN connections when adding exclusions. Packet capture verified this.
  2. We have a static IP provision in our Secure Connect environment but this only applies to traffic destined for services on ports 80/443 traffic. Outbound traffic to services running on any other ports will originate from Cisco's massive pool of dynamic IPs. This is a ridiculous oversight/limitation of the static IP provision cost. Meraki/Cisco should take note!
  3. The traffic for this troublesome app contacts an initial site for auth and license checking using HTTPS but then contacts the database instance on port 40,000-ish. This meant the traffic would originate from different source IPs, as per point 1. At this point, we considered that maybe the app's cloud server may be using session/IP-pinning of some kind and the disparity was causing the issue.
  4. The cloud service has some kind of load-balancers in front of the auth site, which changes the destination IP every few minutes. At this point we almost threw in the towel as the IP differences were at the third octet so excluding them would have been all but impossible without inadvertently excluding massive chunks of the internet.
  5. We couldn't perform DNS-based local internet breakout exclusions because we do not use the MX to proxy DNS requests. We use DNS This is how we fixed the issue on the client VPN but was impossible here. I managed to get the DNS-based exclusions working for a single laptop by altering a test VLAN to provide DHCP with 'upstream proxied DNS'. This worked immediately but is impractical due to the fact that the Meraki MX device uses the primary WAN port's DNS server to resolve, which in our case is Google's DNS. This means we lost local DNS resolution for the domain, which is a show-stopper. I don't know whether MX WAN interfaces would allow an internal DNS server (domain controllers) to be configured so didn't attempt.

 

At this point we figured we'd give the Umbrella admin interface a deeper look in conjunction with the fantastic dbgview.exe from sysinternals.

 

Once we established all the DNS names the app was contacting, we looked at the historic IPs associated with the DNS/URL entries, within Umbrella, and could see that although the IPs rotated quickly, there were only actually about 5 of them per service (about 17 total) and they'd been consistent for the last two years.

 

We added all of the IPs to the tunnel exclusion and it finally worked. It's not ideal adding so many IPs to the list but given the massive change that would be involved with reconfiguring our internal DHCP scopes to use the MX for proxying resolution to our domain controllers, we'll be leaving it as-is.

 

What a faff. Honestly, I'm not sure the Secure Connect solution is really worth it, given the limitation around static IP addressing and the extra complexity of internet breakout. Couple that to the complication of using the Umbrella agent on the local laptops, it makes DNS a much bigger pain to administer.

 

Anyhow, all sorted.

Get notified when there are additional replies to this discussion.