kernel-hacking-2024-linux-s.../net
Eric Dumazet f54b311142 tcp: auto corking
With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

      35209.439626 task-clock                #    2.901 CPUs utilized
             2,294 context-switches          #    0.065 K/sec
               101 CPU-migrations            #    0.003 K/sec
             4,079 page-faults               #    0.116 K/sec
    97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
    51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
    25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
   102,225,978,536 instructions              #    1.04  insns per cycle
                                             #    0.51  stalled cycles per insn [83.38%]
    18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
        91,679,646 branch-misses             #    0.49% of all branches         [83.40%]

      12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      40045.864494 task-clock                #    3.301 CPUs utilized
               171 context-switches          #    0.004 K/sec
                53 CPU-migrations            #    0.001 K/sec
             4,080 page-faults               #    0.102 K/sec
   111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
    61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
    29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
   108,654,349,355 instructions              #    0.98  insns per cycle
                                             #    0.57  stalled cycles per insn [83.34%]
    19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
       157,875,417 branch-misses             #    0.81% of all branches         [83.34%]

      12.130267788 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:51:41 -05:00
..
9p Nothing really exciting: some groundwork for changing virtio endian, and 2013-11-15 13:28:47 +09:00
802
8021q Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:30:30 +09:00
appletalk net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
atm net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
ax25 net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
batman-adv
bluetooth net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
bridge netfilter: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
caif net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
can Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-11-15 16:47:22 -08:00
ceph
core gro: small napi_get_frags() optim 2013-12-06 12:51:40 -05:00
dcb net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dccp ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE 2013-11-05 21:52:27 -05:00
decnet
dns_resolver net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dsa
ethernet
hsr net/hsr: Support iproute print_opt ('ip -details ...') 2013-11-30 12:48:14 -05:00
ieee802154 genetlink: make multicast groups const, prevent abuse 2013-11-19 16:39:06 -05:00
ipv4 tcp: auto corking 2013-12-06 12:51:41 -05:00
ipv6 ipv6: consistent use of IP6_INC_STATS_BH() in ip6_forward() 2013-12-06 12:51:40 -05:00
ipx net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
irda net/irda: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
iucv net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
key net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
l2tp inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions 2013-11-23 14:46:23 -08:00
lapb
llc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
mac80211 mac80211: check csa wiphy flag in ibss before switching 2013-12-02 11:54:13 +01:00
mac802154 6lowpan: set and use mac_len for mac header length 2013-10-30 17:18:46 -04:00
mpls
netfilter netfilter: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
netlabel netlabel: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink genetlink/pmcraid: use proper genetlink multicast API 2013-11-28 18:26:30 -05:00
netrom net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
nfc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-11-19 15:50:47 -08:00
packet packet: use macro GET_PBDQC_FROM_RB to simplify the codes 2013-12-06 12:51:39 -05:00
phonet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-11-19 15:50:47 -08:00
rds rds: prevent BUG_ON triggered on congestion update to loopback 2013-12-03 11:54:18 -05:00
rfkill net: rfkill: gpio: add ACPI support 2013-10-28 15:05:25 +01:00
rose net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
rxrpc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
sched net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
sctp sctp: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
sunrpc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-11-20 14:25:39 -08:00
tipc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
unix net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
vmw_vsock net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless cfg80211: disable CSA for all drivers 2013-12-02 11:53:44 +01:00
x25 net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
xfrm net: move pskb_put() to core code 2013-11-07 19:28:58 -05:00
compat.c net: clamp ->msg_namelen instead of returning an error 2013-11-29 16:12:52 -05:00
Kconfig kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly 2013-11-21 16:42:27 -08:00
Makefile net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) 2013-11-03 23:20:14 -05:00
nonet.c
socket.c Merge branch 'siocghwtstamp' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next 2013-12-05 19:45:14 -05:00
sysctl_net.c