DragonFly users List (threaded) for 2005-11
Re: DP performance
On Monday 28 November 2005 22:13, Matthew Dillon wrote:
> If we are talking about maxing out a machine in the packet
> routing role, then there are two major issue sthat have to be
> * Bus bandwidth. e.g. PCI, PCIX, PCIE, etc etc etc. A standard
> PCI bus is limited to ~120 MBytes/sec, not enough for even a single
> GiGE link going full duplex at full speed. More recent busses can do
> * Workload separation. So e.g. if one has four interfaces and
> two cpus, each cpu could handle two interfaces.
> An MP system would not reap any real gains over UP until one had
> three or more network interfaces, since two interfaces is no
> different from one interface from the point of view of trying to
> route packets.
Should we be really that pessimistic about potential MP performance,
even with two NICs only? Typically packet flows are bi-directional,
and if we could have one CPU/core taking care of one direction, then
there should be at least some room for parallelism, especially once the
parallelized routing tables see the light. Of course provided that
each NIC is handled by a separate core, and that IPC doesn't become the
> Main memory bandwidth used to be an issue but isn't so much any
The memory bandwidth isn't but latency _is_ now the major performance
bottleneck, IMO. DRAM access latencies are now in 50 ns range and will
not noticeably decrease in the forseeable future. Consider the amount
of independent memory accesses that need to be performed on per-packet
basis: DMA RX descriptor read, DMA RX buffer write, DMA RX descriptor
update, RX descriptor update/refill, TX descriptor update, DMA TX
desctiptor read, DMA TX buffer read, DMA TX descriptor update...
Without doing any smart work at all we have to waste a few hundreds of
ns of DRAM bus time per packet, provided we are lucky and the memory
bus is not congested. So to improve the forwarding performance
anywhere above 1Mpps, UP or MP, having the CPU touch the DRAM in the
forwarding path has to be avoided like the plaque. The stack
paralelization seems to be the right step in this direction.
> Insofar as DragonFly goes, we can almost handle the workload
> separation case now, but not quite. We will be able to handle it
> with the work going in after the release. Even so, it will probably
> only matter if the majority of packets being routed are tiny. Bigger
> packets eat far less cpu for the amount of data transfered.