DragonFly kernel List (threaded) for 2008-01
Re: Naive HAMMER question
Matthew Dillon wrote:
:Just had a fight with ext3fs/FreeBSD/Win2K running on the same
:computer with not-fully-cp866-compliant Ukrainian filenames.
:The problem is old enough and several approaches are known, most of
:them are local fs oriented (like "just use the same charset
:everywhere"). But HAMMER is network/cluster/multisystem oriented, so
:different charsets are in use on different nodes, or even different
:users requires different charsets at the same time even locally.
:So, answer please these couple of questions:
:1. Will HAMMER carry any charset/language info for non-ASCII filenames?
:2. Will it map on-disk names to user-defined charset in any way?
:I'd preffer having UTF8 names on-disk (at least it will work for me
:and most of other people, I think).
:Upper layers could specify their one-byte charsets if needed and
:provide names translation on their own.
:PS. I'm not an expert on FS/i18n issues.
My personal opinion is that the kernel should be responsible for
filename translation rather than the filesystem. HAMMER just sees
a character string, it doesn't know or care what format it is in.
Heavily dependent on UTF-8 (and several Chinese encodings) here - I'd
take that a step further towards isolation/agnosticism, and place the
burden on the userland application - not the kernel ro the fs at all.
(as seems to have been the *BSD 'way')
Ultimately I think UTF8 has to be used for maximum compatibility.
When I look at a UTF-8 titled Chinese-named file in, for example OS X
'Finder' - it shows the correct Chinese characters.
'ls' in a CLI show 'XXX' escaped sequences for the same file.
I prefer that minor nuisance because it tells me that the all-important
OS and fs are not making *potentially wrong* guesses - just sticking
with binary is binary.
Much less risk of damage if a browser or editor gets it wrong than if
the fs or kernel get it wrong, as they can simply be optioned or
configured otherwise. IOW - it is display conversion, not source conversion.
grep and friends can, of course, be a PITA, but cut n' paste in a
half-smart terminal window seems to make the conversion to '\xxx' just
fine (though not always the reverse...).
The issue is not specifically addressed in DragonFly (UTF8 is kinda
a cop-out but I still think its better then using UTF16 or UTF32).
More 'effective compromise' than cop-out.
UTF-8's great advantage is that it doesn't much intrude when one sticks
with plain ASCII or any of the several common encodings based on same.
UTF-16 thus seems to have fallen into the gap. Not widely seen in the wild.
As to UTF-32. Does *anyone* actually use it?