VMware ECMP NSX Edges Dropping Packets? Check RPF & Firewall

Recently, I was checking  an interesting issue for one of our customers.  They are experiencing random packet dropping where users accessing their applications running on NSX environment are suffering from intermittent disconnections.

They are running NSX-V 6.4 where their applications are connected to the logical switches  behind the DLR and two perimeter edges running in Equal Cost Multi-Pathing (ECMP) mode providing South-North traffic to their workloads. The topology is similar to the one below:

In this post, I will focus on two features that need your attention when configuring your edges in ECMP mode to avoid such packet dropping issues.

Disable ECMP-Edge firewall

Edge firewall is a stateful service which means it performs stateful packet inspection and tracks the state of network connections. This may drop asymmetric traffic resulting from the multiple data paths available via the ECMP edges.  Firewall need to be disabled for ECMP to operate correctly.

So, the first rule of thumb here, is to disable edge firewall on ECMP edges.

Disable Reverse Path Filtering

This was the cause of our issue.

In NSX Edge, Reverse Path Forwarding (RPF) is enabled by default.

When RPF is enabled, the Edge only forward packets if they are received on the same interface that would be used to forward the traffic to the source of the packet. If the route to the source address of the packet is through a different interface than the one it is received on, the packet is dropped.

For more information you can check the below VMware KB article:

https://kb.vmware.com/s/article/2127073

So, second rule of thumb is to disable RPF on all Edges participating in an asymmetric routing environment.

To disable RPF via GUI:

To disable RPF via REST API, make the below API call to the NSX manager:

PUT https://NSX_mgr_IP/api/4.0/edges/<edge-ID>/systemcontrol/config

<systemControl>
<property>sysctl.net.ipv4.conf.all.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_0.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_1.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_2.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_3.rp_filter=0</property>
</systemControl>

HTTP Result Code: 204 NO CONTENT
If you want to enable it again via API call, just replace 0’s with 1’s.
To show the RPF configuration for your edge:

In the command output, 0’s mean disabled and 1’s mean enabled.

To check RPF drop packet count:

This command shows you the number of packets being dropped by RPF if you are experiencing such an issue.

Conclusion

To avoid any packet drop when you are running ECMP with asymmetric routing, always disable reverse path filtering (RPF) and firewall on your NSX edges.

Hope this post is informative,

Thank you for reading,

Mohamad Alhussein

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.25 out of 5)
Loading...

6 thoughts

  1. Hello Mohamad,

    Thanks you for the great post. Do you have any idea of the impact to do these changes ? What is the behaviour after this change ? because i have the samle issue and i need quantify the impact before changing.

    Thanks you for your help.
    Best regards,
    Hoa HUYNH

    1. Hello Hoa,

      Glad that you liked the post and it was informative. Reverse Path Filter helps in fighting malicious traffic within DC fabric, so obviously disabling RPF to solve packet loss comes with a drawback from security perspective.

      Regards,
      Mohamad

      1. Hello Mohamad,

        Hoa means if the disable of Firewall or RPF in production environemment can cause packet loss or issue… ?
        Regards
        KARIM

        1. Hi Karim,

          RPF is a security feature. When enabled it will simply discard the packet and not route in an instance Edge finds Source of the packet can be reached via X interface but received the packet from Y interface for routing. This security feature capability limits the appearance of spoofed addresses on a network. Disabling firewall is a pre-requisite for ECMP-enabled NSX edges “https://docs.vmware.com/en/VMware-Validated-Design/4.1/com.vmware.vvd-sddc-consolidated-deploy.doc/GUID-267FEDCD-4D16-4BAE-9602-031947F0A9A6.html” and it will not cause any issue as it is a requirement and a recommendation from VMware. RPF is enabled by default to gain from the security benefits of this feature. However if RPF is causing any packet loss it can be disabled without causing any packet loss or any side effect beside losing that security benefit.

          Best Regards,
          Mohamad

Comments are closed.