DragonFly kernel List (threaded) for 2008-01
DragonFly BSD
DragonFly kernel List (threaded) for 2008-01
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Naive HAMMER question

From: Bill Hacker <wbh@xxxxxxxxxxxxx>
Date: Tue, 15 Jan 2008 03:55:04 +0000

Matthew Dillon wrote:
:Hello Matt,
:Just had a fight with ext3fs/FreeBSD/Win2K running on the same
:computer with not-fully-cp866-compliant Ukrainian filenames.
:The problem is old enough and several approaches are known, most of
:them are local fs oriented (like "just use the same charset
:everywhere"). But HAMMER is network/cluster/multisystem oriented, so
:different charsets are in use on different nodes, or even different
:users requires different charsets at the same time even locally.
:So, answer please these couple of questions:
:1. Will HAMMER carry any charset/language info for non-ASCII filenames?
:2. Will it map on-disk names to user-defined charset in any way?
:I'd preffer having UTF8 names on-disk (at least it will work for me
:and most of other people, I think).
:Upper layers could specify their one-byte charsets if needed and
:provide names translation on their own.
:Your vision?
:PS. I'm not an expert on FS/i18n issues.
:-- :Dennis Melentyev

    My personal opinion is that the kernel should be responsible for
    filename translation rather than the filesystem.  HAMMER just sees
    a character string, it doesn't know or care what format it is in.

Heavily dependent on UTF-8 (and several Chinese encodings) here - I'd take that a step further towards isolation/agnosticism, and place the burden on the userland application - not the kernel ro the fs at all.
(as seems to have been the *BSD 'way')

Ultimately I think UTF8 has to be used for maximum compatibility.

When I look at a UTF-8 titled Chinese-named file in, for example OS X 'Finder' - it shows the correct Chinese characters.

'ls' in a CLI show 'XXX' escaped sequences for the same file.

I prefer that minor nuisance because it tells me that the all-important OS and fs are not making *potentially wrong* guesses - just sticking with binary is binary.

Much less risk of damage if a browser or editor gets it wrong than if the fs or kernel get it wrong, as they can simply be optioned or configured otherwise. IOW - it is display conversion, not source conversion.

grep and friends can, of course, be a PITA, but cut n' paste in a half-smart terminal window seems to make the conversion to '\xxx' just fine (though not always the reverse...).

    The issue is not specifically addressed in DragonFly (UTF8 is kinda
    a cop-out but I still think its better then using UTF16 or UTF32).

Matthew Dillon <dillon@backplane.com>

More 'effective compromise' than cop-out.

UTF-8's great advantage is that it doesn't much intrude when one sticks with plain ASCII or any of the several common encodings based on same.

UTF-16 thus seems to have fallen into the gap. Not widely seen in the wild.

As to UTF-32. Does *anyone* actually use it?


[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]