DragonFly kernel List (threaded) for 2008-11
DragonFly BSD
DragonFly kernel List (threaded) for 2008-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: HEADS UP - HAMMER work


From: "Dennis Melentyev" <dennis.melentyev@xxxxxxxxx>
Date: Sat, 15 Nov 2008 21:22:28 +0200

Hi Matt,

2008/11/15 Matthew Dillon <dillon@apollo.backplane.com>:
>
> :It might be a good idea to make a small survey, i.e. find
> :people who actually _do_ have directories with a huge
> :number of files in them (and I mean more than just a few
> :thousands), and ask them what the filenames typically look
> :like.
>
>    That is a very good idea.
>
> :An obvious improvement would be to store name[d-2] and
> :name[d-1] in y[] and z[], respectively, where d is the
> :location of the last dot in the filename, if any, or the
> :location of the terminating zero if there is no dot.
> :In other words:  Ignore the extension when identifying
> :y[] and z[].  Finding the last dot shouldn't be more
> :computationally expensive than strlen(name), so this
> :shouldn't be a problem.
> :
> :Best regards
> :   Oliver
>
>    Another thing I was thinking about was dividing the filename
>    into four zones, and CRCing each zone.
>
>    The zones could be based on dashes and dots, and secondarily on
>    alpha-numeric transitions. If there are fewer then four zones
>    we would simply cut the pieces we do have down the middle, or into
>    quarters.  If there are more then four zones we would combine two
>    or more zones together to fit.
>
>    Here is an off-the-cuff structure:  Four zones, each zone CRC'd,
>    laid out using 16 bit CRC's for each zone ('d' is 15 bits so we
>    can set the LSB bit to zero to guarantee the iteration space).
>
>    aaaaaaaabbbbbbbb ccccccccdddddddd aaaaaaaabbbbbbbb ccccccccddddddd0
>
>    The problem with the zone idea is that it might not work too well
>    if the filenames have varying lengths... though now that I think about
>    it if the filename is otherwise unstructured (no dots, dashes, etc),
>    we could restrict zone A to the first 2-3 chars and zone D to the last
>    2-3 chars, and use zone's B and C to split everything left in the middle.
>

Please, think of it being tunable some way. In no dobt you have a huge
experience, but I'm not sure you can guess every possible situation
and this could be left for administrator, who really knows what do he
need in every particular case.

-- 
Dennis Melentyev



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]