Wednesday, April 22, 2009

Be mindful what you tweet...

So I recently decided to buy, or at least try out, some new sync software (Spanning Sync). It seems they were giving a $5 discount if you could get a referral from someone who has already bought it, so I went ahead and posted on Twitter asking if anyone else was using it. Only one minute later the reply arrives...

Creepy! Oh well. Happy with my choice I once again went ahead and posted:

A little bit later I visited their site again to download the app to another computer, guess what I found on their frontpage?

I suppose I shouldn't be surprised companies like this monitor what is said about them on Twitter. But I'm still amazed how fast they react, I mean I assume there had to be some kind of review process before my comment ended up on their frontpage?

Anyway, I'll get back to what I think about Spanning Sync in later posts.

Saturday, April 18, 2009

I can haz success! Unison hack to enable Unicode normalization offilenames

NOTE: The latest development version of Unison now has built in Unicode support. Check this post for how to compile and use it!

DISCLAIMER: This is a very ugly hack! It's been tested to work in MY setup, but might not work in yours. I really don't know OCaml, or makefiles for that matter. You have been warned!

After much agony I've finally managed to build a hacked version of Unison to make my file sync setup work. The problem, as explained earlier, is that Unison doesn't support Unicode, and that I have to synchronize files between Mac OSX-machines (using UTF8 NFD-normalized filenames) and Windows machines (using latin1 or UTF8 NFKC-normalized filenames). To make filenames containing non ASCII characters transfer correctly, some kind of conversion has to be made, and as of now Unison does not support this.

In my file sync setup, I have three OSX machines synchronizing files using a Windows server as the central node (all OSX machines sync with the Windows machine). Synchronization is always initiated from one of the OSX-machines. What I have done is to install Cygwin on the Windows machine, and also install a hack for Cygwin which enables UTF8 support.

When I first did this I thought it would be enough, but since Windows/Cygwin and OSX uses different Unicode normalization (NFKC and NFD) the bit-by-bit representation of the filenames are different. This is what I set out to fix. I have inserted a few lines of code in the function the preprocesses filenames before comparison is done in Unison. Those lines uses the Camomile Unicode library to normalize the filename to NFKC, so when the OSX and Windows filenames are compared a little bit later they will be bit-wise identical.

This is DEFINITELY not the best way to do this, and does not by far fix all of Unison's encoding problems. What one should do is to rewrite all of the filename handling to support Unicode and also other encodings. But I don't know OCaml very well, in fact I find it quite confusing and frustrating, so for the moment this will have to do for me.

And it seems this is enough to fix my problems. The hack only needs to be applied to the OSX-side of Unison to work, even though it would probably be better if it was applied to both sides (but I'm WAY too lazy to try to compile Unison in Cygwin if it seems I don't have to :P).

So, if anyone needs to sync an OSX machine with a Windows machine, or perhaps with a Linux machine with a UTF8 filesystem, this could perhaps be of some help to you. (Note that while OSX and Windows/Cygwin enforces NFD and NFKC respectivly, Linux does NOT. So in Linux it would be possible to have to two different files with seemingly identical names, but with different normalization. This would obviously not work well with this hack, but that would probably be a less than ideal situation anyway.)

Quick install:

This is the quick install for people who don't want to compile stuff.
  1. Download my precompiled (OSX Leopard) Unison binary here: (600KB, based on Unison 2.27). You only need the modified binary on the OSX side (as long as synchronization is initiated from that side), but all other machines must use the same version of Unison (2.27).

  2. Download the Camomile data files (5MB). These files must be extracted into /usr/local/share/camomile on your OSX machine (hardcoded, sorry!).

Build yourself:

These are instructions for how to build the modified Unison version yourself (for OSX, but might work on other architectures as well):

  1. Download and install OCaml.
  2. Download and install/build Camomile (follow instructions and use the default installation directory).
  3. Checkout a version of Unison with Subversion (I'm using /branches/2.27, but I think it will work with the latest beta version as well).
  4. Replace the files src/ and src/src/Makefile.OCaml with these files.
  5. Compile using "make UISTYLE=text".
  6. The new Unison binary will be at src/unison. I would recommend you rename it to unison-unicode or something to tell it apart from your regular Unison version.
Your modified binary (from either the quick or full install) will enable you to synchronize files with Unicode filenames between an OSX machine and another machine with a UTF8 filesystem (for example Linux). If you want to sync with Windows you need to install Cygwin (make sure to select the unison package during installation) and the Cygwin UTF8 hack as well (make sure it's the cygwin unison binary that is being used during synchronization, use the parameter "-servercmd /usr/bin/unison").

Note that this version of Unison requires that the two file systems being synchronized are UTF8, if it encounters a filename that is not valid UTF8 it will probably crash!

If anyone actually tries this, please post your comments below! Thanks ;)

Sunday, April 12, 2009

Unison Unicode problems

Unison is a pretty awesome file synchronizing utility. It's free, open source, highly customizable and scriptable. It does, however, have one big flaw: it doesn't support Unicode. As long as you synchronize between file systems of identical encoding, it doesn't matter. Unfortunately however, Windows, Linux and MacOSX all use different encodings per default.

My setup synchronizes files between 3 different OSX-machines using a Windows server as the central node. File names containing non-ascii characters like ÅÄÖ gets messed up when transferred, eg. the OSX file räksmörgås.txt will appear as räksmörgaÌŠs.txt on the Windows machine.

This is very annoying. I really like my synchronization setup, and this is the only problem I have with it. What to do? Windows uses latin1 encoding for file names, and OSX uses utf8. What if you could trick windows into using utf8 also? Linux supports utf8 file names, so maybe cygwin can help. Nope, turns out Cygwin does not support Unicode... Googled "cygwin unicode" and found a hack to cygwin which enables Unicode and utf8 support for file names. My hope was rising as räksmörgås.txt seemed to correctly appear on the Windows side. Yes I had done it! Ran unison again to to double check, and the file was now for some reason flagged as new on the windows side, and the whole operation failed when unison tried to copy the file back to the OSX side and failing when discovering that the file was already there.

So, it turns out that there is such a thing as Unicode Normalization. Short story: The same character can be represented in different ways in Unicode, namely composed or decomposed. And, to make matters worse, OSX uses the decomposed form (NFD), and Windows/hacked Cygwin uses the composed form (NFKC). So even though the file is called räksmörgås.txt on both machines, the exact bit representation of the name is different. If I had used a Unicode aware program, this wouldn't have been a problem and the file names would have been recognized as identical. But as I said, Unison is NOT such a program...

I've done some research (ie, googled) there doesn't seem to be any plans to incorporate Unicode support in Unison. It turns out Unison is written in OCaml, which doesn't nativly support Unicode, so adding support for this would according to Unisons developers be pretty hard.

But how hard can it really be? I just need to make sure that both filenames are normalized before they are compared. And there are third party libraries to enable Unicode support in OCaml. So I went off and downloaded the Unison source code, the OCaml binaries, and the Unicode library (Camomile). It was pretty easy to locate the piece of code where the normalization should, or at least could, be done. Only one problem remains: Camomile is very poorly documented, and comes with absolutely no example code! Right, two problems: OCaml is a functional languange (like Haskell), and it turns out I hate functional languages!

To be continued (hopefully)...

UPDATE: Problem kind of solved!

Hosting scare-of-the-day (eVerity)

Short story... My email went down. Checked my websites hosted on the same server; also down. Checked hosting company's homepage; also down (or, only showing a status message as seen to the right).

So far so good, they seem to be working on it. Clicked on the LIVE HELP link and asked them to confirm that email, and not just mysql, was down for the moment. Got reply:

"These problems started when we restored a backup of YOUR site. Hacking is a crime! You need to be getting yourself a new host asap!"

At about the same time all my sites started to show an "account suspended"-message. Ehm... Well, I don't remember hacking my own server, and honestly I didn't even know I could hack. I must say this gave me quite a scare, since I thought I might be loosing all my gigs of email I keep on their servers.

So, it turns out they had confused me with someone else (some guy's account got suspented, and he used a friend's account on the same server to get back at the hosting company). Sites are up, and email is up save some DNS problems. All well that ends well, but really, this is not the kind of greeting you'd like when you inquire about why your email is down :/