Forcedeth notes: Difference between revisions

Latest revision as of 11:22, 17 October 2023

forcedeth is the linux driver for various Ethernet adapters that appear in nForce chipsets

...apparently including:

CK804

MCP51

MCP55

MCP67

MCP73

MCP79

MCP89

tl;dr (and tweaking for fun and profit)

If (and only if) the device doesn't seem to work at all, set module options:

msi=0 msix=0

(TODO: figure out why)

If your host gets slow when doing large transfers, and see lots of "too many iterations" in logs, that's mostly due to a high interrupt rate, and the following seems a decent balance between still getting around 1 gbit, and using the lowest interrupt rate it needs to get that:

optimization_mode=2 poll_interval=38 max_interrupt_work=40

See notes below for more

Note that unused NICs will still interrupt

I had a motherboard with two, of which one was unused in practice

won't cause issues, but you might as well disable one

so if using CPU mode or dynamic mode, consider disabling these interfaces them to avoid unnecessary interrupts

'Too many iterations' problem

"too many iterations (6) in nv_nic_irq" (in dmesg/syslog)

Means the NIC's interrupt handler says it saw more packets stored in the NIC than that handler code has been configured to move in a single run of the interrupt-handler.

This may mean we're getting behind

If you see it only singular messages, sporadically, you can often consider it an ignorable warning, because the next interrupt handler run got to it (though it may still mean the occasional dropped packet, and retransmit where the higher level protocol does that).

Yet if you see hundreds+ of these in sequence, then you can assume packets are coming in so fast that the we may be leaving the NIC's ringbuffer full, probably overwriting old frames, so dropping frames (and triggering TCP flow control), so you want to tweak it.

Host sluggishness, and interrupts

It seems that throughput mode contributes to both the too-many-iteration problem, and can cause the host to respond sluggishly when under load.

If you suspect this problem: run vmstat 1 to check the current interrupt rate. Order of a thousand is a fine base rate. Order of tens of thousands starts to be an issue.

Consider that moving over 100MB/s with 1500-byte packets means ~75k packets per second.

The default option, throughput mode seems to interrupt for every packet, so 75k interrupts for 100MB/s.

this gives the lowest possible latency in theory, except you can assume that computers forced to do over ~30k interrupts/second are computing less efficiently

The second option, CPU mode triggers the interrupt on a NIC timer - meaning a constant rate of interrupts

regardless of the data present, or of when the packets arrives.

Most of these interrupts would be short, as they often have no packet to handle, but have to to be run.

The average latency is a little higher, and depends more on this interrupt rate than on the packets being communicated.

The third option, dynamic mode uses

the low-latency but interrupt-heavy mode only under light load (for low latency, and interrupt rate is never)
the polling option under heavy load (when throughput is probably more important, and latency is less important)

In the latter two cases you may also wish to tweak how many packets are handled in a single run. Read the section below.

Some testing

Because I wanted to know the interrupt rate that would be used, I did some experiments with CPU mode and ended up on:

optimization_mode=1 poll_interval=38 max_interrupt_work=40

This poll_interval is short enough to allow ~112MB/s transfers, at the cost of approximately 2500 interrupts per second (per network port)

Some of my tests:

Throughput mode did 119MB/s, at 115000ints/s.
CPU mode with poll_interval=38 did 112MB/s (2500 ints/s) (max_interrupt_work can probably be as low as 35(verify))
CPU mode with poll_interval=50 did 87MB/s (1800 ints/s) (max_interrupt_work can probably be as low as 45(verify))
CPU mode with poll_interval=75 did 60MB/s (1200 ints/s) (max_interrupt_work can probably be as low as 70(verify))

You could probably lower the poll_interval if you care more about latency, but note that spending many interrupts/s for a 0.0001s difference in latency has few real-world cases. It makes more sense to use dynamic mode.

Module parameters

Some things you might consider setting include:

max_interrupt_work

The maximum events handled per interrupt (basic NAPI style interrupt mitigation, also a guard against long-running interrupts).
Default seems to be something like 5 or 4.

Note that setting this higher won't fix things if you're not interrupting fast enough.

optimization_mode has three possible values:

0 is 'Throughput mode', and the default
- interrupts will be triggered for every tx and rx packet (...wouldn't that lessen the effectiveness of NAPI-style handling?(verify))
- moves each packet through in the least possible time - meaning lower latency
- ...but just assumes interrupts can keep up. It's quite relevant that 115MB/s with 1500-byte Ethernet frames means ~75kpackets/s, potentially in both directions. That many interrupts will slow down other work on most computers)

1 is 'CPU mode' (timer mode, really) (added around 2005, driver version 0.46 or so?)
- interrupts will be triggered by a configurable timer on the NIC, which effectively means fixed-rate polling
- Presumably, the longer the interval, the higher you want max_interrupt_work to be(verify), and you'll be limited by the NIC buffer size(verify)

2 is 'Dynamic mode' (added around 2009, in driver version 0.63 or 0.64)(verify)
- switches between throughput mode under light load, and CPU/timer mode under heavier load.
- can't use MSI-X in this mode (which may may not care about, or may be a reason to stick to CPU mode.)

poll_interval

In CPU mode (see above), this value controls how frequently the NIC triggers its interrupt.

The unit is hundredths of milliseconds (approximately, there's a factor 1000/1024 in there). Default seems to be 97 (meaning approx. 1ms interval and 1000ints/sec).

This setting's value directly influences the average latency and its jitter (basically because packets will on average be waiting half the interval time). To you, this probably means you balance reasonable interrupt load against latency needs.

msi and msix

Whether to use Message Signaled Interrupts.

It seems that for a few specific hardware variants (and/or driver variants, or combinations? (verify)) the driver won't work until you set msi=0 and/or msix=0

@@ Line 17: / Line 17: @@
 '''If (and only if) the device doesn't seem to work at all''', set module options:
   msi=0 msix=0
+(TODO: figure out why)
-'''To avoid high interrupt rate''' and related host sluggishness,
+'''If your host gets slow when doing large transfers, and see lots of "too many iterations" in logs''',
-and '''most 'too-many-iterations' problems''', the following seems a decent balance:
+that's mostly due to a high interrupt rate, and the following seems a decent balance between still getting around 1 gbit, and using the lowest interrupt rate it needs to get that:
   optimization_mode=2 poll_interval=38 max_interrupt_work=40
-* mode 2 is 'Dynamic mode': good latency under low load, good throughput under high load, and avoids host sluggishness caused by high interrupt load.
+See notes below for more
-* The values here allow ~1GBit throughput at roughly the lowest interrupt speed you need for that. You could tweak these further, see notes below.
 Note that '''unused NICs''' will still interrupt
 : I had a motherboard with two, of which one was unused in practice
+:: won't cause issues, but you might as well disable one
 : so if using CPU mode or dynamic mode, consider disabling these interfaces them to avoid unnecessary interrupts
@@ Line 40: / Line 40: @@
 than that handler code has been configured to move in a ''single'' run of the interrupt-handler.
+This ''may'' mean we're getting behind
+: If you see it only singular messages, sporadically, you can often consider it an ignorable warning, because the next interrupt handler run got to it (though it may still mean the occasional dropped packet, and retransmit where the higher level protocol does that).
-If you see hundreds+ of these in sequence, then you can assume packets are coming in so fast that the we may be leaving the NIC's ringbuffer full, probably overwriting old frames, so dropping frames (and triggering TCP flow control), so you want to tweak it.
+: Yet if you see hundreds+ of these in sequence, then you can assume packets are coming in so fast that the we may be leaving the NIC's ringbuffer full, probably overwriting old frames, so dropping frames (and triggering TCP flow control), so you want to tweak it.
-If you see it only singular messages, sporadically, you can often consider it an ignorable warning, because the next interrupt handler run got to it (though it may still mean the occasional dropped packet).
 ===Host sluggishness, and interrupts===
@@ Line 51: / Line 51: @@
 and can cause the host to respond sluggishly when under load.
 If you suspect this problem: run {{inlinecode|vmstat 1}} to check the current interrupt rate.
+Order of a thousand is a fine base rate. Order of tens of thousands starts to be an issue.
 Consider that moving over 100MB/s with 1500-byte packets means ~75k packets per second.
 '''The default option, throughput mode''' seems to '''interrupt for every packet''', so 75k interrupts for 100MB/s.
-This gives the lowest possible latency, except that computers tend to work less efficiently handling over order of maybe ~30k interrupts/second.
+: this gives the lowest possible latency in theory, ''except'' you can assume that computers forced to do over ~30k interrupts/second are computing less efficiently
-'''The second option, CPU mode''' triggers the interrupt on a NIC timer - meaning '''a constant rate of interrupts''' regardless of the data present or when the packets arrives.
+'''The second option, CPU mode''' triggers the interrupt on a NIC timer - meaning '''a constant rate of interrupts'''
+: regardless of the data present, or of ''when'' the packets arrives.
-Most of these interrupts would be short as they often have nothing to do, but have to to be run.
+: Most of these interrupts would be short, as they often have no packet to handle, but have to to be run.
-The average latency is a little higher, and depends more on this interrupt rate than on the packets being communicated.
+: The average latency is a little higher, and depends more on this interrupt rate than on the packets being communicated.
@@ Line 92: / Line 93: @@
-You could probably lower the poll_interval if you really care about latency.
+You could probably lower the poll_interval if you care more about latency, but note that spending many interrupts/s for a 0.0001s difference in latency has few real-world cases. It makes more sense to use dynamic mode.
-However, spending many interrupts/s for a 0.0001s difference in (pretty much low-use-only) latency may have little value in most real-world cases. It makes more sense to use dynamic mode.
 <!--