diff options
author | Eric Dumazet <dada1@cosmosbay.com> | 2009-05-12 20:48:02 +0000 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2009-05-17 20:47:44 -0700 |
commit | d62fda082c48b417b47a553860abf75d9cf8b591 (patch) | |
tree | 1b2679e4fcce72eb6ac584ecf9cc039fe9ea2c4a /CREDITS | |
parent | 9dc20c5f78c53bf57fb7874b6e942842e1db20d3 (diff) |
bnx2: bnx2_tx_int() optimizations
When using bnx2 in a high transmit load, bnx2_tx_int() cost is pretty high.
There are two reasons.
One is an expensive call to bnx2_get_hw_tx_cons(bnapi) for each freed skb
One is cpu stalls when accessing skb_is_gso(skb) / skb_shinfo(skb)->nr_frags
because of two cache line misses.
(One to get skb->end/head to compute skb_shinfo(skb),
one to get is_gso/nr_frags)
This patch :
1) avoids calling bnx2_get_hw_tx_cons(bnapi) too many times.
2) makes bnx2_start_xmit() cache is_gso & nr_frags into sw_tx_bd descriptor.
This uses a litle bit more ram (256 longs per device on x86), but helps a lot.
3) uses a prefetch(&skb->end) to speedup dev_kfree_skb(), bringing
cache line that will be needed in skb_release_data()
result is 5 % bandwidth increase in benchmarks, involving UDP or TCP receive
& transmits, when a cpu is dedicated to ksoftirqd for bnx2.
bnx2_tx_int going from 3.33 % cpu to 0.5 % cpu in oprofile
Note : skb_dma_unmap() still very expensive but this is for another patch,
not related to bnx2 (2.9 % of cpu, while it does nothing on x86_32)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'CREDITS')
0 files changed, 0 insertions, 0 deletions