New HA setup: Additional IP's not receiving traffic

Miyo360
Getting noticed

New HA setup: Additional IP's not receiving traffic

Hello,

 

Yesterday I added a 2nd MX to our network, which until this point was a single MX75. We have a block of 5 usable IP addresses from our ISP.

x.x.x.202
x.x.x.203
x.x.x.204
x.x.x.206
x.x.x.207

Our single MX was set to use .202 as its WAN IP. I setup the spare MX on .203, added it to the dashboard and everything looked good. Traffic was flowing, and the HA status was good.

We have some 1:many NAT rules setup on the remaining 3 IP addresses, but these were not passing traffic. I called our ISP and asked them to clear their ARP cache, which they did. Not fixed. I called again, they cleared a 2nd time (on their 'core router'), but this still didn't fix the problem. They sent me their ARP table which was...

x.x.x.202 08:f1:b3:XX:XX:Xf (confirmed as the MAC address of the Primary MX)
x.x.x.203 08:f1:b3:XX:XX:Xe (confirmed as the MAC address of the Spare MX)
x.x.x.204 cc:03:d9:XX:XX:X1 (confirmed by Meraki engineer as the virtual MAC of the HA pair)
x.x.x.206 cc:03:d9:XX:XX:X1 (confirmed by Meraki engineer as the virtual MAC of the HA pair)
x.x.x.207 cc:03:d9:XX:XX:X1 (confirmed by Meraki engineer as the virtual MAC of the HA pair)

This all makes sense to me, as the last 3 addresses must forward to the virtual MAC for when failover occurs. However, when running a packet capture for ANY traffic to .204, .205 or .206 nothing is received by the primary MX.

Meraki support point their finger to the ISP and are saying they need a packet capture from an upstream device to confirm traffic is being sent to the MX. The ISP are saying they cannot run a packet capture on their switch (which is their device in our comms room), and cannot run a packet capture on their core router unless their is a fault, which by pinging .202 and .203 confirms there is not.

The topolgy is 

ISP Switch > My Switch > Primary MX
                                            Spare MX

I can run a packet capture on my switch tomorrow (when at site) to see if indeed packets for .204-.206 are being sent, but in the meantime, does anyone have any ideas how this would be resolved?

11 Replies 11
D_Tak
Meraki Employee
Meraki Employee

Greetings,

Prior to adding the Spare MX appliance was this the same configuration from ISP switch > your switch > MX appliance? Were the 1:Many NAT rules in place and working prior to the addition of the secondary MX? If you power off the secondary MX do the rules start working as configured? What sort of swith are you utilizing between the ISP and the Meraki MX pairing? What are the port configurations for this switch? 


If the same behavior is still in place where the MX is not receiving on the WAN captures even with the spare powered off it would be great to have the captures from your switch both on the port connected to the ISP switch along with the port connected to the MX to ensure that the swtich is properly receiving/sending the traffic over the required ports. Is it possible to plug the MX directly into the ISP switch (either just the primary or the HA Pair) to bypass the additional switch in between? This may be beneficial if your capture from your switch shows the same behavior where you are not seeing the inbound traffic for the 204, 205, and 206 IP addresses and would then be easier to ask the ISP where that traffic is going since its not making it to the MX appliance. 

 

 

Miyo360
Getting noticed

Hi. 

 

Thanks for your reply. Great questions! I'll do my best to answer...

"Prior to adding the Spare MX appliance was this the same configuration from ISP switch > your switch > MX appliance?"

No. The ISP switch went was plugged directly into the WAN1 port on the single MX.

"Were the 1:Many NAT rules in place and working prior to the addition of the secondary MX?"

Yes, they have been in place and working on the single MX for many weeks prior to this change.

"If you power off the secondary MX do the rules start working as configured?"

I can test tomorrow. I'm not on site today. 

"What sort of switch are you utilizing between the ISP and the Meraki MX pairing?"

A small ubiquiti/unifi switch. 

"What are the port configurations for this switch?"

Port 1 - Vlan99 (access) - connected to ISP's switch
Port 2 - Vlan99 (access) - connected to Primary MX, WAN1
Port 3 - Vlan99 (access) - connected to Spare MX, WAN1
Port 4 - [empty]
Port 5 - vlan80 (access) - connected to Primary MX, Port 12 (PoE)

vlan99 is not routed on any device, including on the MX. It is intended just to isolate the traffic from the ISP to the two MX's.

vlan80 is our management vlan. Port 12 on the MX is PoE (so powers the small switch) and allows management access to the switch.

"Is it possible to plug the MX directly into the ISP switch (either just the primary or the HA Pair) to bypass the additional switch in between?"

Yes... but I only have a single "hand off" (is that the correct term?) from the ISP switch. Only 1 port is live, so I can only plug into either the Primary MX or Secondary MX. I can try this tomorrow also.

My troubleshooting tomorrow will be in the following order...

1. Run packet capture on my switch, mirroring port 1. Check for traffic to .204-.206
2. Run packet capture on my switch, mirroring port 2. Check for traffic to .204-206
(this should give me enough evidence to know if the my switch is an issue, but I will also try #3 below).
3. Plug Primary MX directly into ISP switch, see if IPs .204-.206 are receiving traffic.

Am I right in saying the following...

If test #1 above shows packets are sent to port 1, then the ISP stuff is working fine and to run the next test. Otherwise contact ISP.
If test #2 above shows packets are NOT being sent to port 2, then my switch setup is wrong/not working, replace switch (if have a old, dumb hub I could try instead) and try again. Otherwise if packets are being sent to port 2 then MX must be receiving the packets and the MX (or HA) is broke somehow. Contact Meraki for troubleshooting.

D_Tak
Meraki Employee
Meraki Employee

I would not suspect the secondary MX unless the HA pair was showing as Master/Master in the dashboard as that may cause some confusion for the upstream device.

I would also question the switch port configuration on the ISP equipment to see if a VLAN mismatch is causing any problems in the forward of traffic since this worked as expected with just a single MX in place. In this instance not using the shared IP of the MX may be causing problems with the 2 appliances having different IPs if they are both master at the present time. 

If when you power of the spare MX appliance and if things start working as expected I would request you call in and work with support opening the original ticket if its still not open and outline the troubleshooting done as you can then clearly narrow it down to the HA pairing especially if the Primary MX is working through the Ubiquit switch. 

Are or have either of the MX appliances been alerting in the dashboard or have you seen any failover from primary to spare? I ask this as if the spare is in secondary it will only do the management connectivity items used to ensure its able to be online in the Meraki Dashboard and that it has internet connectivity. If the MX appliances both show Master there may be a routing issue connected as they could potentially both be receiving the inbound traffic. 

Miyo360
Getting noticed

Hey, thanks again for your reply.

"I would not suspect the secondary MX unless the HA pair was showing as Master/Master in the dashboard" 

The devices show correctly in the dashboard, primary is "active", spare is "passive; ready".

"Are or have either of the MX appliances been alerting in the dashboard or have you seen any failover from primary to spare?"

No, the status of each MX appears stable.

I think the packet capture on the my switch will be very interesting. I will report back tomorrow.

PhilipDAth
Kind of a big deal
Kind of a big deal

>However, when running a packet capture for ANY traffic to .204, .205 or .206 nothing is received by the primary MX.

I have a hunch the secondary MX is causing a problem and advertising it's MAC address.  I will be interested to see if when you power off the secondary MX if everything starts working.

Are you running current stable or better firmware on the MX?

Miyo360
Getting noticed

Thanks for the reply.

"I have a hunch the secondary MX is causing a problem and advertising it's MAC address."

Do you mean for the virtual MAC addresses? The ones starting cc:03:d9?

What is slightly confusing is the documentation says the virtual MAC will be based on cc:09:d9, then use the last 3 octets from the primary MX. My last 3 octets from the primary MX end in xx:xx:xf, but the list of MAC's that the ISP sent me shows IPs 204-206 having a MAC of cc:03:d9:xx:xx:e, which clearly isn't the same as what I would expect based on the description in the docs.  The spare MX phyiscal mac ends x1, so it doesn't match that either.

"I will be interested to see if when you power off the secondary MX if everything starts working."

How soon after powering off the spare MX should I notice a change (if there is going to be one)? Would it be immediate, or should I ask my ISP to clear their ARP cache?

"Are you running current stable or better firmware on the MX?"

The primary MX is running 18.107.9, which I understand is stable firmware.

Ryan_Miles
Meraki Employee
Meraki Employee

Which document says cc:09:d9? The MX Warm Spare doc shows cc:03:d9 when a VIP is used.

https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair#Virtual... 

Also, you don't have a VIP configured. So the virtual MAC thing shouldn't apply.

Ryan

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
Miyo360
Getting noticed

"Which document says cc:09:d9?"

Exactly the document you linked. I didn't spot the bit about it only applying to VIP's. I'm surprised the Meraki tech didn't spot that on our call yesterday.

"Also, you don't have a VIP configured. So the virtual MAC thing shouldn't apply."

Interesting. Perhaps this is the issue then? Why are the ISP seeing a MAC address starting cc:09:d9 if I'm not using Virtual IP's?

Miyo360
Getting noticed

So, a packet capture on my switch (mirroring port 1, which is connected to the ISP uplink) showed traffic arriving addressed to IP 205, which confirmed at least the IPS stuff was working as expected. Another packet capture mirroring port 2 (which goes to the Primary MX WAN1 port) showed no traffic going to/from 205, to/from my source 2nd laptop, which was generating the interesting traffic. I sent these pcaps to Meraki support who confirmed my switch does not appear to be forwarding these packets.

To prove the point I swapped out my switch for an old, dumb hub and traffic on all IP's started flowing and connections were established, so the culprit does indeed appear to be my switch (or switch config). I have reached out to Ubiquiti for advice.

I will update this topic when I have a response.

Thanks both for your thoughts and comments.

cmr
Kind of a big deal
Kind of a big deal

Not that it is a solution, but I've always advocated using L2 unmanaged switches for WAN splitting for two reasons:

1) what you've seen here where perhaps some tags are clashing

2) as the switch is completely unprotected, best to have something that can't be hacked...

If my answer solves your problem please click Accept as Solution so others can benefit from it.
Miyo360
Getting noticed


@cmr wrote:

Not that it is a solution, but I've always advocated using L2 unmanaged switches for WAN splitting for two reasons:

1) what you've seen here where perhaps some tags are clashing

2) as the switch is completely unprotected, best to have something that can't be hacked...


Thanks. Understood. That's likely my solution - replace with something simpler. I used these Unifi switches because they were perfect for this purpose; tiny 5-port switch, powered over PoE so can be powered from a single MX LAN port and used for both management and power, visibility into traffic stats (not very detailed, just basic throughput and volume), troubleshooting, and I happened to have some lying around. Oh well.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels