Search
Thursday, 8 July 2010
Tika and Solr
To index Word, Excel, PDF and other "unstructured" documents, Solr uses Tika, another Apache project.
Tika comes bundled in Solr and is ready to run in Solr. However, if you want to run Tika individually you have to copy a few .jar files around.
cd [Your path]/apache-solr-nightly/lib cp commons-io-1.4.jar commons-codec-1.3.jar [Your path]/apache-solr-nightly/example/solr/lib cp ~/.m2/repository/org/jempbox/jempbox/0.2.0/jempbox-0.2.0.jar [Your path]/apache-solr-nightly/example/solr/lib
java -jar tika-0.2.jar
Config
If you want to index Word, Excel, PDF, and other types of documents, there is a bit of additional configuration to do. To index those files types you have to get a nightly build of Solr from here, and copy some files and directories as described in the link at the end of this post. You have to add the following lines to example/solr/conf/solrconf.xml:
last_modified
true
Subscribe to:
Posts (Atom)