DragonFly kernel List (threaded) for 2008-01
[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]

Re: Naive HAMMER question

From:	Bill Hacker <wbh@xxxxxxxxxxxxx>
Date:	Tue, 15 Jan 2008 03:55:04 +0000

Matthew Dillon wrote:

:Hello Matt, : :Just had a fight with ext3fs/FreeBSD/Win2K running on the same :computer with not-fully-cp866-compliant Ukrainian filenames. :The problem is old enough and several approaches are known, most of :them are local fs oriented (like "just use the same charset :everywhere"). But HAMMER is network/cluster/multisystem oriented, so :different charsets are in use on different nodes, or even different :users requires different charsets at the same time even locally. : :So, answer please these couple of questions: : :1. Will HAMMER carry any charset/language info for non-ASCII filenames? :2. Will it map on-disk names to user-defined charset in any way? : :I'd preffer having UTF8 names on-disk (at least it will work for me :and most of other people, I think). :Upper layers could specify their one-byte charsets if needed and :provide names translation on their own. : :Your vision? : :PS. I'm not an expert on FS/i18n issues. :-- :Dennis Melentyev
    My personal opinion is that the kernel should be responsible for
    filename translation rather than the filesystem.  HAMMER just sees
    a character string, it doesn't know or care what format it is in.

Heavily dependent on UTF-8 (and several Chinese encodings) here - I'd take that a step further towards isolation/agnosticism, and place the burden on the userland application - not the kernel ro the fs at all. (as seems to have been the *BSD 'way')

Ultimately I think UTF8 has to be used for maximum compatibility.

When I look at a UTF-8 titled Chinese-named file in, for example OS X 'Finder' - it shows the correct Chinese characters.

'ls' in a CLI show 'XXX' escaped sequences for the same file.

I prefer that minor nuisance because it tells me that the all-important OS and fs are not making *potentially wrong* guesses - just sticking with binary is binary.

Much less risk of damage if a browser or editor gets it wrong than if the fs or kernel get it wrong, as they can simply be optioned or configured otherwise. IOW - it is display conversion, not source conversion.

grep and friends can, of course, be a PITA, but cut n' paste in a half-smart terminal window seems to make the conversion to '\xxx' just fine (though not always the reverse...).

    The issue is not specifically addressed in DragonFly (UTF8 is kinda
    a cop-out but I still think its better then using UTF16 or UTF32).

-Matt Matthew Dillon <dillon@backplane.com>

More 'effective compromise' than cop-out.

UTF-8's great advantage is that it doesn't much intrude when one sticks with plain ASCII or any of the several common encodings based on same.

UTF-16 thus seems to have fallen into the gap. Not widely seen in the wild.

As to UTF-32. Does *anyone* actually use it?

Bill

Follow-Ups:
- Re: Naive HAMMER question
  - From: walt <wa1ter@myrealbox.com>
- Re: Naive HAMMER question
  - From: Erik Wikström <Erik-wikstrom@telia.com>

References:
- Naive HAMMER question
  - From: "Dennis Melentyev" <dennis.melentyev@gmail.com>
- Re: Naive HAMMER question
  - From: Matthew Dillon <dillon@apollo.backplane.com>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]