Hi team,
I've got a problem that I'd like to pick your brains about. I've got a Meraki MX64 deployed at a client site. Connected to the Internet port is a VDSL modem (carriage is via Australian NBN - FTT). VDSL modem is in bridge mode, and PPPoE Auth is set up on the Meraki.
This setup has been in place for a few years now, and over this time has been completely rock solid. Meraki gets the public static IP terminated onto it, and everything works swimmingly.
Late Oct/early Nov last year, there were some cabling "problems" at the site, caused by a concrete saw and a 10t excavator. New twisted pair was pulled in new ducting all the way in from the street, so the cabling should be brand new (that was the only upside).
Site was back working faultlessly for 4-6 weeks following this - back to it's normal rock-solid reliable state.
Early Jan, the service started to experience frequent (and prolonged) failures of the primary link. This would occur multiples times during the business day, and some of the outages would be for hours at a time. The site is configured with a backup 4G link, and the Meraki would fail over connection to the backup in the event of a primary failure. However the backup link is slow and expensive. Also, the failovers are disruptive with the timeouts that Meraki specifies. Obviously we want it to stop and go back to it's nice reliable state.
Suspecting the cheap ISP-provided VDSL modem had failed, we replaced it with a brand new one ASAP. No good - same dropouts.
Nothing else at the site has changed. All equipment is installed in a locked 19" full-height rack, in a 24/7 air-conditioned room, and all cabling is secured behind the rack. No obvious damage to cabling from rats, no one has mucked with anything, etc.
I've logged tickets with both the ISP and Meraki support.
From the ISP's side:
- ISP says that they can see the PPPoE connection dropping, and then being reestablished when the Meraki next asks for it.
- ISP can see that throughout all of this, the VDSL modem has maintained sync with the network, so they don't think it's a carriage issue. They believe that the Meraki simply isn't connecting the PPPoE tunnel.
- ISP has run a line quality diagnosis test however that is showing abnormal attenuation on the line, so they suspect the internal cabling (the brand new cabling) may (suddenly) be faulty.
From Meraki's side:
- Meraki have looked at the logs, and can see nothing wrong with the MX64. From their side it looks like the ISP is dropping the PPPoE tunnel and then refusing connection requests for a period of time until it successfully reestablishes.
- One thing that the Meraki tech did note is that the MX64 is not seeing a PADO response from the ISP on a lot of occasions. They believe that the ISP side may be overloaded, and it's simply not responding to PPPoE connection requests.
- The MX64 is running ver 14.40 firmware. It was on the previous version when the problem began occurring, and upgrading to the current fw was one of the first things I tried.
@Bri84 seems to have had a similar problem in October 2019 (albeit a different ISP): https://community.meraki.com/t5/Security-SD-WAN/PPPOE-Issue/m-p/63557/highlight/true#M16134
The only thing I can think of is maybe it's a MTU problem? I have no idea why it would suddenly crop up, but maybe the ISP replaced a piece of equipment in their network that's got a lower MTU setting, or something?
adam2104's comment in this thread: https://www.reddit.com/r/meraki/comments/cax4cf/obscure_meraki_firewall_problems/ is kinda what I'm thinking of trying next. Not that it should affect the PPPoE tunnel setup at all, but maybe?
Does anyone here have any thoughts on what might be causing this, or what I need to try next? Meraki Support have asked for a packet capture, but obviously catching one in the act is difficult. Other than that - anything that I should be looking for/at?
Cheers,
Matt