get rid of atomic_foo ops in the tx start and completion paths.
atomics were used to coordinate updates to the number of available
slots on the tx ring. start would use what was available, and txeof
(completion) would add back freed slots. start and completion
update a producer and consumer index respectively, so we can use
those with the size of the ring to calculate space instead.
while here i simplified what txeof does a fair bit, which combined
with the removal of the atomics gives us a bit of a speed improvement.
hrvoje popovski reports up to a 20% improvement in one environment,
but 5 to 10 is probably more realistic.
ive had this in a tree since 2017, but mpi's "Faster vlan(4)
forwarding?" post made me dig it out and clean it up.
Cortex A76 is not affected by spectre variant 2 branch target injection
attacks described in CVE-2017-5715 and ATF does not implement a
workaround for Cortex A76.
Transfers that span multiple TRBs which wrap around the ring and
thus have the Link TRB inbetween must have the Chain Bit set in the
Link TRB. Otherwise xHCI controllers might think that the transfer
ends at that point.
Fixes an issue that was most prominently seen as Invalid CSW error
when using umass0 on octeon and i.MX8M.
Tested by visa@