Search

Thursday, 9 May 2013

Ubuntu 12.04 and SOLR 3.6.2 install

Solr installation

Hi, I thought I would revisit my solr install to try and clarify a few points where people were having problems loading the solr-cell class and getting binary files to index, with errors such as these:


Error loading class    
    'org.apache.solr.handler.dataimport.DataImportHandler'

So, hopefully by following this tutorial you should get a working solr install:


Install base OS, 64 bit variant, in this case Ubuntu 12.04 64 bit

create a solr user and log in with that user

Go to 'Software centre' and install Synaptic Package Manager
install tomcat6 + tomcat6-common + tomcat6-admin

sudo apt-get install tomcat6
sudo apt-get install tomcat6-admin


Modify the Tomcat admin configuration to allow a new manager login.

sudo gedit /etc/tomcat6/tomcat-users.xml

Add the following to the file, commenting out other similar lines

Restart tomcat

sudo service tomcat6 restart


Check the root of Tomcat works by visiting: http://localhost:8080
Then log into the Tomcat manager page by visiting http://localhost:8080/manager/html and entering your Tomcat username and password that you setup previously

Prepare the solr environment


Create a temp folder for the solr binary

cd /home/solr/
mkdir tmp/solr/


Navigate to that folder

cd tmp/solr/

download the relevant  Apache Lucene solr binary using WGET (you will need to get the  URL for the version you wish to install)

wget http://apache.mirrors.timporter.net/lucene/solr/3.6.2/apache-solr-3.6.2.tgz

Decompress the binary file, copy files into their relevant locations and give the tomcat user rights to the working folder

tar xzvf apache-solr-3.6.2.tgz
cd apache-solr-3.6.2
sudo cp dist/apache-solr-3.6.2.war /var/lib/tomcat6/webapps/solr.war
sudo cp -fr example/solr /var/lib/tomcat6/
sudo chown -R tomcat6:tomcat6 /var/lib/tomcat6/solr


Restart tomcat6

sudo service tomcat6 restart

Set the solr home directory

sudo gedit /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml

Modify the file accordingly, by uncommenting the solrhome env entry and adding the line below:

/var/lib/tomcat6/solr/

Restart tomcat6

sudo service tomcat6 restart

Check you can start solr

http://localhost:8080/solr

Setup solr-cell for binary file (word, pdf etc) indexing

First we need to load the solr-cell module in solrconf.xml

sudo gedit /var/lib/tomcat6/solr/conf/solrconfig.xml

Find and change the solr-cell line to match the following:

lib dir="/var/lib/tomcat6/webapps/solr/WEB-INF/lib/" regex="apache-solr-cell-\d.*\.jar"

Save the file

Now we must copy all the relevant .jar files into the classpath library

cd /home/solr/tmp/solr/apache-solr-3.6.2/

sudo cp contrib/extraction/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
sudo cp contrib/dataimporthandler/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
sudo cp dist/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/


Restart tomcat6

sudo service tomcat6 restart

Test our solr installation

First we must install the curl command to send binary files via http to the request handler

sudo apt-get install curl

Create a text (non image) PDF file by creating a text file in Gedit with a keyword, such as 'otter' and printing the result to the file test.pdf (save this in your /home/documents/ folder)
Now we will take the test file and 'curl' it into solr

cd /home//Documents/

curl "http://localhost:8080/solr/update/extract?literal.id=smoketest&commit=true" -F "myfile=@test.pdf"

Now test to see if you can search the index for the id

From a remote PC enter the following URL:
http://:8080/solr/admin
In the query string box enter the following:

id:smoketest

Press search and the results should show a match for the 'numFound' variable. This shows 1 matching hit for 'smoketest' so we can confirm the system is working correctly.

Once the basic system is running it is time to apply the custom solr schema (if you have/need one). The schema resides in the following location:
/var/lib/tomcat6/solr/conf/schema.xml