DragonFly commits List (threaded) for 2004-07
Re: cvs commit: src/sys/vfs/nfs nfs_serv.c
:-On [20040716 18:42], Matthew Dillon (dillon@xxxxxxxxxxxxxxxxxxxx) wrote:
:> Someone actually wrote a paper based on the server-side heuristic I
:> wrote in 1999?
:I emailed you about that site/paper back in February of this year, see
:Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / kita no mono
I'm sure its somewhere in my pile :-)
Its an interesting paper, despite all the mistakes. I did like the work
they did to bring nfsheur (which I wrote in 99) up-to-date.
NFS servers are finicky beasts. There is a three-way tradeoff that
makes characterizing algorithms almost impossible. For an NFS
server performance is a combination of (1) reducing physical disk seeking,
(2) efficient use of the disk cache, and (3) cpu overhead.
So, for example, one can reduce physical disk seeking by increasing the
read-ahead or doing unconditional read-ahead, but this presumes an
infinitely-sized disk cache. Any benchmark with fairly light cache
characteristics will see an improvement but that doesn't mean that
doing unconditional read-ahead is a good idea. Likewise, the efficiency
of the disk cache (that is, the ability of the disk cache to satisfy
a request without having to go to the disk) has a huge impact on
server performance. An inefficient disk cache will destroy server
performance in a heavily loaded environment. Cpu overhead is less of
an issue on modern systems but can still have a considerable impact on
Using tagging with SCSI disks is a function of the disk manufacturer.
Seagate traditionally has had the best tagging firmware while most other
vendors have (traditionally) had crap firmware. But tagging itself is
still at the mercy of disk seeks and on-disk cache algorithms. In
particular, on-disk cache algorithms can interfere/be-redundant against
system caches and this can result in lower performance... but whos fault
is that? The disk caches algorithms or the kernel caches algorithms?
There is no definitive answer.
Memory copy overhead has dropped significantly in the last few years
relative to I/O bandwidth. A modern cpu, e.g. like an AMD64 or a P4,
is capable of 3+ GBytes/sec worth of uncached copying bandwidth and
this generally blows away the measily 60MB/s that a modern disk
can do. So data copying alone is not an issue any more, though its
presence within the algorithm can still cause other issues to occur
that might cause people to believe that the data copying is at fault.
On the otherhand, it is quite clear to me that the block sizes we
traditionally use for I/O are far too small. Our FS code believes
that an 8K request is reasonable and a 64K request is 'clustering'
when, in fact, the actual truth of the matter is that a 32K request
is reasonable and a 256K request is 'clustering'. This basic problem
in our core filesystem code is responsible for a lot of the benchmark
confusion that occurs when people try to test things above the
This is doubly true due to the data layout methodology used by most
modern hard disks... most hard disks lay the sectors out on their
tracks BACKWARDS rather then forwards. What this means is that the
disk's firmware cache will start caching data from the track the moment
the head settles on it and will continue caching at least until it hits
the sector that was requested... at the point it hits the sector that
was requested it will *ALREADY* have 'future' data in its cached (due
to the backwards layout). This means that modern disks almost NEVER
actually spend extra time waiting for read-ahead-requested data before
issuing the next seek.