Kristian Lunde

Installing nutch 1.0 on OSX

Today I started to work on a little project that required a crawler, and Nutch seemed to do most of what I needed. The nutch team conveniently released Nutch 1.0 late in March 2009, so I had a brand new release to test out. Installing nutch 1.0 on a mac is not as straight forward as I thought, I ran into a lot of unexpected issues and here is my cook book description of how to successfully install nutch 1.0 on your mac.

  1. Download the latest source code from the Apache SVN repository I tried running it from the tarball without success, I also tried to compile the source from the tarball, but a post on the nutch forum clearly states that this will not work.
  2. Set your JAVA_HOME and NUTCH_JAVA_HOME variables, again this is not straight forward, they both need to point to your real installation of Java 1.6 (earlier versions of Java will fail). I sat these variables to: /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home, I could not get the /Library/Java/Home symbolic link to work properly.
  3. Compile the source code using Ant (I built it in Eclipse).
  4. Setup your nutch configuration, by following the tutorial by Peter P. Wang
  5. Run your first crawl with: ./bin/nutch crawl urls -dir crawl -depth 3 -topN 50

Most of the issues I encountered was related to the Java version and the fact that using /Application/Utilities/Java/Java preferences application do not really change the JAVA_HOME directory /Library/Java/Home properly. So make sure you have set both JAVA_HOME and NUTCH_JAVA_HOME, and that your OSX does not fool you when it pretend to be symbolically linking to the 1.6 installation.

Good luck.

April 7th, 2009 at 9:15 pm

2.5 applications I really miss in OS X

In a previous post I wrote about my new life running on a MacBook Pro and OSX. It has now been over a month since I switch over to this unix hybrid, and I am quite liking it. It is very stable, I almost never turn off my mac, but I put it to sleep, this is working fine and my last reboot is over two weeks ago. I have also gotten used to some of the new weird keys on the keyboard and the shortcuts, but I am not yet as efficent on a mac as I am/were on ubuntu/windows. During the last few weeks I have discovered that Apple and other in most cases provide me with the applications I need, but not always, and here is the list of applications I really miss:

1. TortoiseSVN
Windows application which integrates itself with Windows Explorer and provide a SVN client. I would say that this is the best graphical SVN client I have ever used. SVNX which I currently use on the mac is not a very good replacement.

2. Kate / Notepad++
Kate is a KDE text editor for unix based systems. Notpad++ is Kates equivalent on Windows. Both editors provide a simple and intuitive user interface, and a lot of syntax highlight files for all the obscure programming languages you can think of.
I know Mac have the TextMate application, but that is third party software and you have to pay €48 or something for a license, and that is probably what I probably will do. The TextMate application is really good and provide most if not all the functionality that Kate and Notepad++ provide.

In my desperation for a good text editor I almost went off and tried to install KDE on Mac, but that was said to be experimental and could break my entire system. So that is a no go for now. The article however was really interesting:


I still miss my ubuntu system and will probably go off and install parallels or vmware and ubuntu, just to have it accessible :)

December 3rd, 2008 at 11:51 pm

