The Lair of the Purpleotter: July 2010

Thursday, 8 July 2010

Article to install latest SOLR build on Ubuntu 10.04

http://charlesleifer.com/blog/how-to-set-up-solr-on-ubuntu-1004-or-whatever/

Thanks Dave Hall for your brilliant article

http://davehall.com.au/blog/dave/2010/06/26/multi-core-apache-solr-ubuntu-1004-drupal-auto-provisioning

Tika and Solr

To index Word, Excel, PDF and other "unstructured" documents, Solr uses Tika, another Apache project.

Tika comes bundled in Solr and is ready to run in Solr. However, if you want to run Tika individually you have to copy a few .jar files around.

cd [Your path]/apache-solr-nightly/lib
cp commons-io-1.4.jar commons-codec-1.3.jar [Your path]/apache-solr-nightly/example/solr/lib
cp ~/.m2/repository/org/jempbox/jempbox/0.2.0/jempbox-0.2.0.jar [Your path]/apache-solr-nightly/example/solr/lib

java -jar tika-0.2.jar

Config

If you want to index Word, Excel, PDF, and other types of documents, there is a bit of additional configuration to do. To index those files types you have to get a nightly build of Solr from here, and copy some files and directories as described in the link at the end of this post. You have to add the following lines to example/solr/conf/solrconf.xml:

    
      last_modified
      true

Subscribe to: Posts (Atom)