"MAC address flapping" with Intel X710 10GbE LAN on MS120-48LP (MS 16.9)?

cabricharme
Getting noticed

"MAC address flapping" with Intel X710 10GbE LAN on MS120-48LP (MS 16.9)?

A very weird issue with Meraki MS120-48LP starting to report "MAC address flapping" on multiple wired ports.

 

I can't yet make sense of it. Suspecting incompatibility between the switch and the NIC or a bug somewhere.

 

Anything I can do to zero in on the culprit short of installing a new NIC and disabling the onboard Intel X710-AT2 on the server?

 

Notes:

  • The device is Supermicro TwinPro SYS-120TP-DC8TR where each of the two server modules in a 1U chassis has a dual-port 10GbE NIC (Intel X710-AT2).
  • One of the servers in the Twin is online (running ESXi 7.03) with both the X710 ports connected to the switch. Seems to be doing OK except relatively frequent and quite regular packet loss spikes:
    cabricharme_0-1733758563637.png

     

  • The moment the 2nd server in the Twin is powered up and the OS comes online (also ESXi 7.03), the switch starts reporting "MAC address flapping" - but not on the 2nd server - on the 1st one. (That's one of several baffling things about it.)
  • Sample of "flapping" events in Meraki logs:

     

    Dec 5 13:33:07

    **_Switch01 

    Ports 29, 31, 29

    Switch port

    MAC address flapping

    mac: 7C:C2:**:**:08:74, vlan: NNN, port: 29,31,29

    Dec 5 13:32:31

    **_Switch01 

    Ports 29, 31, 29

    Switch port

    MAC address flapping

    mac: 7C:C2:**:**:08:74, vlan: NNN, port: 31,29,31

    Dec 5 13:32:11

    **_Switch01 

    Ports 29, 31, 29

    Switch port

    MAC address flapping

    mac: 7C:C2:**:**:08:74, vlan: NNN, port: 31,29,31

    Dec 5 13:31:42

    **_Switch01 

    Ports 29, 31, 29

    Switch port

    MAC address flapping

    mac: 7C:C2:**:**:08:74, vlan: NNN, port: 31,29,31

    Port 29 is connected to the working server - which seemed to work fine and generate no "flapping" issues prior to the 2nb server firing up. Port 31 - the 2nd server. (It's as if the 2nd server's NIC is telling the switch that it has the same MAC address as the NIC on another server?)
  • If I disable one of the ports on the switch connected to the 2-port NIC on the 2nd server - no change, still "flapping".
  • There is a thread on Intel forum from 2021 describing a similar issue with Intel X710 and Cisco 5672UP switch, and where switching ports, cables, SFPs, etc. - did not help. (It didn't in my case, either.)
    • The resolution was "shutdown port" on the switch after which the issue went away for them. In my case, I "disabled" the port on the 2nd server several times - but not on the 1st one - as it's running production workloads and cannot be easily taken offline.
  • Port 29 configuration on MS120-48LP in Meraki dashboard:
    Port statusEnabled
    TypeTrunk
    Native VLAN***
    Allowed VLANsall
    Access policyOpen
    Link negotiationAuto negotiate (1 Gbps)
    RSTPEnabled (Forwarding)
    Port scheduleUnscheduled
    Port isolationDisabled
    Trusted DAIDisabled
    UDLDAlert only
    Tagsnone
    PoEDisabled
    Port mirroringNot mirroring traffic

     

 

Thanks for any help!

12 Replies 12
cmr
Kind of a big deal
Kind of a big deal

Are the two servers sharing a vSwitch, or do they have their own separate ones.  It seems that the MAC address (from a VM?) is being seen on the LAN ports of both servers.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
cabricharme
Getting noticed

Thanks for the quick response!

  • Everything on the network is sharing a single physical switch. There's only one physical switch at that location.
  • No distributed vSwitches are configured. (Is that what you mean by "shared"?) Each has a separate (if identically configured) set of vSwitches.
  • The ESXis do have identically configured vSwitches - which is necessary for VMotion. (All of our 40+ ESXis are similarly configured and we had no similar "flapping" issues until this one came along.)
  • The "flapping" MAC address reported by Meraki belongs to the X710 NIC and not to any VM.
  • The 2nd ESXi has no running VMs and thus, no VM on it can conflict with VMs on the 1st one.
cmr
Kind of a big deal
Kind of a big deal

Indeed, distributed vSwitch is what I meant! 

  • Can you verify that the second NIC has a different MAC (I know it should)?  I have the same NIC in my home lab, but it is connected to a C9300L in LACP mode (and works fine). 
  • If you disconnect the first server and then only connect the second, does port 31 see a different MAC? 
  • If you then reconnect the first server, does that MAC flap?
If my answer solves your problem please click Accept as Solution so others can benefit from it.
cmr
Kind of a big deal
Kind of a big deal

I'd also try the stable MS17 release.

If my answer solves your problem please click Accept as Solution so others can benefit from it.
cabricharme
Getting noticed

Yes, the 2nd NIC has a different MAC (for each of its two ports) - verified in Meraki and in the server's BMC, and in UEFI BIOS.

 

Can't do much with the first server - like disconnect ports - until the 2nd is up and I can migrate VMs to it... 🙂

 

Will try v.17, thanks! (Probably not until Thursday.)

PhilipDAth
Kind of a big deal
Kind of a big deal

Does it have some kind of integrated switch module in the chassis linking the two together?

For example, if you don't plug in the Ethernet cable to the second blade, can you still talk to the second blade?

cabricharme
Getting noticed

No, the server modules only share power supplies and a storage backplane (no shared storage either - just the backplane). The NICs are completely isolated from each other.

 


@PhilipDAth wrote:

For example, if you don't plug in the Ethernet cable to the second blade, can you still talk to the second blade?


Definitely a "no" 🙂

 

(We run 20+ of these in remote locations so I am pretty familiar with them. This one is unique in terms of the specific NIC chipset, the motherboard and the CPUs - but it shares the architecture with all other TwinPros and BigTwins we are running.)

IvanJukic
Meraki Employee
Meraki Employee

Hi @cabricharme ,

I've come across similar issues like this in the past, where the host/server has Spanning Tree and or mismatched VLAN set to default. Run a packet capture from the MS120 Port to the Host to confirm. You can set up Port Mirror if needed.

https://documentation.meraki.com/MS/Monitoring_and_Reporting/Packet_Captures_and_Port_Mirroring_on_t...



Cheers,

Ivan Jukić,
Meraki APJC

If you found this post helpful, please give it kudos. If it solved your problem, click "accept as solution" so that others can benefit from it.
cabricharme
Getting noticed

Thanks Ivan, I'll see if I can do it - but wanted to ask first: what would be my next steps based on the results of the capture? (And what exactly would I be looking for in the capture?)

 

(The similar issues you've come across in the past - have they been posted to this forum, by any chance?)

 

Thanks!

IvanJukic
Meraki Employee
Meraki Employee

Hey @cabricharme,

Re: the pcap.

1.) Just run a raw capture, no filters at all. 60-second duration is fine.

2.) Enter "STP" in the filter of WireShark.

3.) Check the STP frames for the Server/Host Mac Addresses. If so, it is running Spanning-Tree. 

4.) Log into the KVM or Web UI etc for the Server/Host to turn off Spanning-Tree.


See example below;

 

STP Pcap ExampleSTP Pcap Example

 

Re: The similar issues you've come across in the past.
This comes from my Personal experience. Not sure if customers have posted here or not. I've also run into this during deployment. 



Cheers,

Ivan Jukić,
Meraki APJC

If you found this post helpful, please give it kudos. If it solved your problem, click "accept as solution" so that others can benefit from it.
cabricharme
Getting noticed

(The server is refusing to power up (even BMC/IPMI) after I installed a 2nd dual port NIC - so all further testing is suspended until we resuscitate it.)

IvanJukic
Meraki Employee
Meraki Employee

Possibly a faulty controller? Let us know how you go.


Cheers,

Ivan Jukić,
Meraki APJC

If you found this post helpful, please give it kudos. If it solved your problem, click "accept as solution" so that others can benefit from it.
Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels