DragonFly BSD
DragonFly users List (threaded) for 2006-08
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Confusion over encodings, utf-8, etc.

From: "Chris Csanady" <cc@xxxxxxx>
Date: Wed, 30 Aug 2006 03:33:24 -0500

On 8/29/06, Jonathon McKitrick <jcm@xxxxxxxxxxxxxxxxx> wrote:

I've entered the world of unicode with my mac, and I'd like to make things consistent across my machines.

Are you interested in utf-8 for filesystem encoding as well? This is where utf-8 becomes painful. It turns out that there are several normalization forms for it, where Windows uses NFC, and and MacOS uses NFD. The difference being that identical characters can be stored in different ways. For example, for a character with an umlaut, the umlaut can optionally be external. See http://www.unicode.org/reports/tr15/index.html for more information.

Now, the mac expects NFD, and the Finder is completely incapable of
dealing with NFC.  Trying to move those files around over NFS can result
in them simply dissappearing!  Likewise, NFD encoded files visible over
SMB on a Windows machine are not displayed properly.

Unfortunately, the only portable option is to use SMB exclusively to export
NFC encoded files from a unix box.  In this case, the SMB client on the mac
handles the renormalization.  If you use NFS exclusively with a mac, note
that some non-native applications like Azureus will still write NFC encoded
filenames which must me converted manually.

If you have lots of files to translate, there is a convenient tool available in
pkgsrc: converters/convmv.  It allows you to recursively renormalize or
transcode a set of files from/to anything supported by iconv.  It needs to be
used over a fs which does not mangle characters though, so something
like NFS or a local filesystem.


[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]