as explained earlier, is that Unison doesn't support Unicode, and that I have to synchronize files between Mac OSX-machines (using UTF8 NFD-normalized filenames) and Windows machines (using latin1 or UTF8 NFKC-normalized filenames). To make filenames containing non ASCII characters transfer correctly, some kind of conversion has to be made, and as of now Unison does not support this.
In my file sync setup, I have three OSX machines synchronizing files using a Windows server as the central node (all OSX machines sync with the Windows machine). Synchronization is always initiated from one of the OSX-machines. What I have done is to install Cygwin on the Windows machine, and also install a hack for Cygwin which enables UTF8 support.
When I first did this I thought it would be enough, but since Windows/Cygwin and OSX uses different Unicode normalization (NFKC and NFD) the bit-by-bit representation of the filenames are different. This is what I set out to fix. I have inserted a few lines of code in the function the preprocesses filenames before comparison is done in Unison. Those lines uses the Camomile Unicode library to normalize the filename to NFKC, so when the OSX and Windows filenames are compared a little bit later they will be bit-wise identical.
This is DEFINITELY not the best way to do this, and does not by far fix all of Unison's encoding problems. What one should do is to rewrite all of the filename handling to support Unicode and also other encodings. But I don't know OCaml very well, in fact I find it quite confusing and frustrating, so for the moment this will have to do for me.
And it seems this is enough to fix my problems. The hack only needs to be applied to the OSX-side of Unison to work, even though it would probably be better if it was applied to both sides (but I'm WAY too lazy to try to compile Unison in Cygwin if it seems I don't have to :P).
So, if anyone needs to sync an OSX machine with a Windows machine, or perhaps with a Linux machine with a UTF8 filesystem, this could perhaps be of some help to you. (Note that while OSX and Windows/Cygwin enforces NFD and NFKC respectivly, Linux does NOT. So in Linux it would be possible to have to two different files with seemingly identical names, but with different normalization. This would obviously not work well with this hack, but that would probably be a less than ideal situation anyway.)
Quick install:This is the quick install for people who don't want to compile stuff.
- Download my precompiled (OSX Leopard) Unison binary here: unison-unicode.zip (600KB, based on Unison 2.27). You only need the modified binary on the OSX side (as long as synchronization is initiated from that side), but all other machines must use the same version of Unison (2.27).
- Download the Camomile data files (5MB). These files must be extracted into /usr/local/share/camomile on your OSX machine (hardcoded, sorry!).
These are instructions for how to build the modified Unison version yourself (for OSX, but might work on other architectures as well):
- Download and install OCaml.
- Download and install/build Camomile (follow instructions and use the default installation directory).
- Checkout a version of Unison with Subversion (I'm using /branches/2.27, but I think it will work with the latest beta version as well).
- Replace the files src/case.ml and src/src/Makefile.OCaml with these files.
- Compile using "make UISTYLE=text".
- The new Unison binary will be at src/unison. I would recommend you rename it to unison-unicode or something to tell it apart from your regular Unison version.
Note that this version of Unison requires that the two file systems being synchronized are UTF8, if it encounters a filename that is not valid UTF8 it will probably crash!
If anyone actually tries this, please post your comments below! Thanks ;)