DragonFly kernel List (threaded) for 2008-11
DragonFly BSD
DragonFly kernel List (threaded) for 2008-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]


From: FloW <flo.w@xxxxxxx>
Date: Sun, 16 Nov 2008 14:54:27 +0000

Hi there

I usually follow the lists and development of DFBSD quietly, not having enough time to bring in real effort. But this caught my interest :-)

Matthew Dillon wrote:

aaaaa name[0] & 0x1F bbbbb name[1] & 0x1F ccccc name[2] & 0x1F mmmmmm crc32(name + 3, len - 5) and some xor magic -> 6 bits yyyyy name[len-2] & 0x1F zzzzz name[len-1] & 0x1F h[31] crc32(name, len) (entire filename)

0aaaaabbbbbccccc mmmmmmyyyyyzzzzz hhhhhhhhhhhhhhhh hhhhhhhhhhhhhhh0

Don't know if you already thought of this, but you probably want to use different CRC's for m and h, like crc16 for m and crc32 for h. If h is just an ordinary function of (a,b,c,m,y,z) you unnecessarily lose hash space. Ok, I'm not a specialist in CRC, so I could be wrong.

Also I wouldn't consider the zone idea (from a later mail) with a changing length of zone B, since different name lenghts would completely scramble the hash locality.

As said before, it strongly depends on the particular file name usage if this sort of semi-sorted hash provides locality or not. Obviously the generic and optimal solution would be to compress the first bits via a directory wide patricia trie. Not feasible here though.

If you find any realworld use-cases, that would simplify the argument around particular benefits and weaknesses of these hashes a lot.

In my opinion the realm of oversized directories would be ad-hoc, prototype or research software. Any "mature" application I've seen yet used subdirectories to organize such a huge number of files.
Your best bet for these "immature" kind of applications is probably to have simple guidelines on how to do efficient file naming. Possibly simpler than to apply the patricia trie on the application side ;-)



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]