DragonFly kernel List (threaded) for 2008-11
Re: HEADS UP - HAMMER work
:It might be a good idea to make a small survey, i.e. find
:people who actually _do_ have directories with a huge
:number of files in them (and I mean more than just a few
:thousands), and ask them what the filenames typically look
That is a very good idea.
:An obvious improvement would be to store name[d-2] and
:name[d-1] in y and z, respectively, where d is the
:location of the last dot in the filename, if any, or the
:location of the terminating zero if there is no dot.
:In other words: Ignore the extension when identifying
:y and z. Finding the last dot shouldn't be more
:computationally expensive than strlen(name), so this
:shouldn't be a problem.
Another thing I was thinking about was dividing the filename
into four zones, and CRCing each zone.
The zones could be based on dashes and dots, and secondarily on
alpha-numeric transitions. If there are fewer then four zones
we would simply cut the pieces we do have down the middle, or into
quarters. If there are more then four zones we would combine two
or more zones together to fit.
Here is an off-the-cuff structure: Four zones, each zone CRC'd,
laid out using 16 bit CRC's for each zone ('d' is 15 bits so we
can set the LSB bit to zero to guarantee the iteration space).
aaaaaaaabbbbbbbb ccccccccdddddddd aaaaaaaabbbbbbbb ccccccccddddddd0
The problem with the zone idea is that it might not work too well
if the filenames have varying lengths... though now that I think about
it if the filename is otherwise unstructured (no dots, dashes, etc),
we could restrict zone A to the first 2-3 chars and zone D to the last
2-3 chars, and use zone's B and C to split everything left in the middle.