QuickNAT:HighPerformanceNATSystemonCommodityPlatforms(2)
时间:2023-05-04 15:19 来源:网络整理 作者:墨客科技 点击:次
The PREROUTING chain is responsible for DNAT (Destination NAT) for packets that just arrive at the network interface. The routing decision is made after the packet has passed the PREROUTING chain. For the packets sent to local host, SNAT (Source NAT) is done in INPUT chain. For the forwarded packet, they pass the POSTROUTING chain for SNAT and then leave the server. There is a small difference for locally generated packets: they pass through the OUTPUT chain for DNAT and then move on to the POSTROUTING chain. NAT rules are organized within those separate chains in Netfilter. When packets come to the chain, each rule is examined in linear order and this linear search is costly with a large number of rules. After finding a rule, the header of the packet is modified based on the rule and new connection records are added to the connection tracking table [7]. In this way, only the first packet of a flow needs to search for NAT rules and all subsequent packets exchanged in either direction can apply the same NAT decision made for the first packet without additional lookup. However, the connection tracking table is shared among multicores with high lock overhead. 2.2 Related Work Since it is well known that Netfilter doesn't scale well to deal with a large number of small-size packets, some papers advertise improving Netfilter performance. Mingming Yang et al. [8] proposed an algorithm to dynamically change the order of rules and the order of hook functions in Netfilter. Feng Liu et al. [9] built hash tables for NAT mapping table. However, the performance improvement for Netfilter in these works is limited because they didn't modify linear search algorithm for NAT rule table. Accardi, Kristen et al. [10] prototyped a hybrid firewall, using a Network Processor (NP) to accelerate Linux Netfilter on a general-purpose server. Nonetheless, they didn't make any modification to Netfilter and the enhancement of performance mainly resulted from special-purpose hardware. Apart from modifying Linux Kernel, there exist a number of networking implementations that bypass the kernel's protocol stack entirely. PFQ [11], PF_RING [12], PF_RING ZC (Zero Copy) [13], Intel DPDK [14] and Netmap [15] directly mapping NIC buffers into user-space memory to accelerate network. DPDK has been successfully used in accelerating virtual switch as in the case of DPDK vSwitch [16]. There is a growing demand for high-speed NAT in data centers with high-volume traffic, but there aren't previous works about high performance NAT system on those kernel-bypass frameworks. 3 System Design In order to improve the performance of NAT on commodity platforms, we design Quick NAT system built on DPDK. Quick NAT Search (QNS) algorithm is designed to look up NAT rules with complexity of O(1). In addition, Quick NAT system leverages the lock-free hash table to reduce the expense of locks when sharing NAT mapping records among CPU cores on multicore commodity servers. Moreover, Quick NAT achieves full zero-copy in the process of NAT to cut down the overhead of copy. 3.1 System Overview The architecture of Quick NAT system is shown in Figure 2. Quick NAT system utilizes DPDK's capabilities to bypass the kernel and be built in the user space. Quick NAT system is composed of four components, i.e. Connection Tracer, Rule Finder, Tuple Rewriter and IP/Port Pool. Quick NAT system bypasses Linux Kernel to access packets directly from the NIC. Once receiving a packet, Connection Tracer searches for a connection record using hash search at first. If one connection record is found, the header of this packet is modified by Tuple Rewriter according to this connection record. Otherwise, Rule Finder uses QNS algorithm to look up NAT rule tables with complexity of O(1) and a IP/Port pair is picked up from IP/Port pool according to the NAT rule. Then, Quick NAT system revises packet's header and adds two connection records for the following packets of this flow and the reply flow. In Quick NAT system, we have made three major contributions to improve the performance of NAT. First, we design QNS algorithm to look for the NAT rules with complexity of O(1). QNS algorithm uses the hash search instead of sequential search, reducing the time to search NAT rule tables. Second, we use Receive-side Scaling (RSS) to distribute flows across multiple processor cores in multi-core servers. To reduce the overhead of locks, Quick NAT uses lock-free hash table to share connection records among CPU cores. Third, Quick NAT enables zero-copy delivery and polling to eliminate the overhead of copy and interrupt. 3.2 QNS Algorithm Traditional NAT system leverages the Netfilter framework in Linux kernel on commodity servers. As the speed of NIC cards gets faster, the time interval between packets' arrival becomes smaller. For 10Gbps NICs, the arrival rate is 14.8M packets per second at most, giving only 67.2ns to deal with each packet. However, Linux Netfilter cannot handle this network-intensive workload. According to the source code of Linux Netfilter, we found that Netfilter uses sequential search algorithm to search for NAT rules that are stored in linear order. In other words, Netfilter does not scale well with a large number of rules. (责任编辑:admin) |